A Julia package for easily downloading and accessing popular cheminformatics datasets.
using Pkg
Pkg.add("MoleculeDatasets")using MoleculeDatasets
# Download and load a dataset
data = get_mol_dataset("esol")See dataset_info.jl
To add a new dataset to the package, edit the MOL_DATASETS dictionary in src/dataset_info.jl. Each dataset entry should include:
For local datasets:
"dataset_key" => Dict(
    "name" => "Dataset Display Name",
    "description" => "Brief description of the dataset",
    "filepath" => "data/filename.csv",
    "format" => "csv",
    "size" => "file size",
    "type" => "local",
    "reference" => "Full citation",
    "doi" => "DOI if available",
    "website" => "URL if available"
)For remote datasets:
"dataset_key" => Dict(
    "name" => "Dataset Display Name",
    "description" => "Brief description of the dataset",
    "url" => "https://example.com/dataset.csv",
    "format" => "csv",
    "size" => "file size",
    "type" => "remote",
    "reference" => "Full citation",
    "doi" => "DOI if available",
    "website" => "URL if available"
)get_mol_dataset(name; output_dir="data", force_download=false, verbose=true): Download and load a dataset as a DataFrame