This is an original PyTorch implementation of the ExORL framework from
Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning by
Denis Yarats*, David Brandfonbrener*, Hao Liu, Misha Laskin, Pieter Abbeel, Alessandro Lazaric, and Lerrel Pinto.
*Equal contribution.
Install MuJoCo if it is not already the case:
- Download MuJoCo binaries here.
- Unzip the downloaded archive into
~/.mujoco/. - Append the MuJoCo subdirectory bin path into the env variable
LD_LIBRARY_PATH.
Install the following libraries:
sudo apt update
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3 unzipInstall dependencies:
conda env create -f conda_env.yml
conda activate exorlWe provide exploratory datasets for 6 DeepMind Control Stuite domains
| Domain | Dataset name | Available task names |
|---|---|---|
| Cartpole | cartpole |
cartpole_balance, cartpole_balance_sparse, cartpole_swingup, cartpole_swingup_sparse |
| Cheetah | cheetah |
cheetah_run, cheetah_run_backward |
| Jaco Arm | jaco |
jaco_reach_top_left, jaco_reach_top_right, jaco_reach_bottom_left, jaco_reach_bottom_right |
| Point Mass Maze | point_mass_maze |
point_mass_maze_reach_top_left, point_mass_maze_reach_top_right, point_mass_maze_reach_bottom_left, point_mass_maze_reach_bottom_right |
| Quadruped | quadruped |
quadruped_walk, quadruped_run |
| Walker | walker |
walker_stand, walker_walk, walker_run |
For each domain we collected datasets by running 9 unsupervised RL algorithms from URLB for total of 10M steps. Here is the list of algorithms
| Unsupervised RL method | Name | Paper |
|---|---|---|
| APS | aps |
paper |
| APT(ICM) | icm_apt |
paper |
| DIAYN | diayn |
paper |
| Disagreement | disagreement |
paper |
| ICM | icm |
paper |
| ProtoRL | proto |
paper |
| Random | random |
N/A |
| RND | rnd |
paper |
| SMM | smm |
paper |
You can download a dataset by running ./download.sh <DOMAIN> <ALGO>, for example to download ProtoRL dataset for Walker, run
./download.sh walker protoThe script will download the dataset from S3 and store it under datasets/walker/proto/, where you can find episodes (under buffer) and episode videos (under video).
We also provide implementation of 5 offline RL algorithms for evaluating the datasets
| Offline RL method | Name | Paper |
|---|---|---|
| Behavior Cloning | bc |
paper |
| CQL | cql |
paper |
| CRR | crr |
paper |
| TD3+BC | td3_bc |
paper |
| TD3 | td3 |
paper |
After downloading required datasets, you can evaluate it using offline RL methon for a specific task. For example, to evaluate a dataset collected by ProtoRL on Walker for the waling task using TD3+BC you can run
python train_offline.py agent=td3_bc expl_agent=proto task=walker_walkLogs are stored in the output folder. To launch tensorboard run:
tensorboard --logdir outputIf you use this repo in your research, please consider citing the paper as follows:
@article{yarats2022exorl,
title={Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning},
author={Denis Yarats, David Brandfonbrener, Hao Liu, Michael Laskin, Pieter Abbeel, Alessandro Lazaric, Lerrel Pinto},
journal={arXiv preprint arXiv:2201.13425},
year={2022}
}
The majority of ExORL is licensed under the MIT license, however portions of the project are available under separate license terms: DeepMind is licensed under the Apache 2.0 license.