A repository for storing, versioning, and validating relational datasets.
- A list of all versions is on the GitHub tags page
- A list of all datasets is on the relational-datasets download page
- Descriptions of each dataset are hosted with the relational-datasets documentation, for example:
boston_housingoverview
relational-datasets is a
Python package that assists in loading and downloading data from this
repository.
For example, you can load training and test sets for webkb fold-2 with:
# pip install relational-datasets
from relational_datasets import load
train, test = load("webkb", "v0.0.4", fold=2)RelationalDatasets.jl
is a Julia package that helps load/download data from this repository:
# ] add RelationalDatasets
using RelationalDatasets
train, test = load("webkb", "v0.0.4", fold=2)Specific Version: Versions of each data archive may be downloaded by sending
requests to a url with the following pattern, where {VERSION} represents a tag
and {NAME} is the name for a dataset:
https://github.com/srlearn/datasets/releases/download/{VERSION}/{NAME}_{VERSION}.zip
Download version v0.0.4 of toy_cancer:
curl -L https://github.com/srlearn/datasets/releases/download/v0.0.4/toy_cancer_v0.0.4.zip > toy_cancer_v0.0.4.zipDownload version v0.0.4 of webkb:
curl -L https://github.com/srlearn/datasets/releases/download/v0.0.4/webkb_v0.0.4.zip > webkb_v0.0.4.zip