Benchmarking #1276

MischaPanch · 2025-10-16T08:59:47Z

Refactoring of the scripts, reducing parametrization of HL scripts to the minimum, restored same default config as in v0.5.0 (except for mujoco task versions, which have been bumped from 3 to 4).
Various improvements in the rliable eval code are done.
Also, this PR adds the possibility to run and evaluate multiple experiments directly from an ExperimentBuilder. This possibility is used to establish a benchmarking script that will run multiple scripts in parallel in tmux sessions, evaluate them with rliable and aggregate the stats such that they can be displayed in the benchmarking section in the logs.

This concludes most of the preparations for establishing easily reproducible benchmarking runs.

Experiment is no longer a dataclass

Directly uses the builder and an argument-parser friendly parametrization

…lti-experiment option

… all hl examples

Apart from general improvements, added the possibility to persist as json, to be used in benchmarking and visualization

Conflicts: poetry.lock pyproject.toml tianshou/highlevel/experiment.py

This form is expected by the visualization script

We will include a separate example in a readme, no need for a script

The bottleneck for off-policy algos is not rollout but training, so it makes sense to parallelize experiments by default

In particular, changed save_interval default to None so we don't save the policy after each epoch

MischaPanch added 2 commits August 1, 2025 13:34

Experiment: remove inheritance of DataclassPPrintMixin

b87aea4

Experiment is no longer a dataclass

Merge branch 'master' into benchmarking

56270cd

MischaPanch requested a review from opcode81 October 16, 2025 08:59

MischaPanch added 4 commits October 16, 2025 11:02

High level interface for launching and evaluating multiple experiments

baf4822

Directly uses the builder and an argument-parser friendly parametrization

Fixed missing passing of seed

a8e70b6

Minor

3f00231

Added low-level API example for multiple experiments with rliable eval

56ef25e

MischaPanch force-pushed the benchmarking branch from b391d88 to 56ef25e Compare October 16, 2025 09:02

MischaPanch added 8 commits October 17, 2025 14:44

Removed unneeded dedicated hl_multi example

ec344f7

Modified ppo_hl example, simplifying the config options and adding mu…

9234fcd

…lti-experiment option

Simplifying the config options and adding multi-experiment option for…

82dc1ad

… all hl examples

Improved and extended rliable evaluation module

ed362e8

Apart from general improvements, added the possibility to persist as json, to be used in benchmarking and visualization

Updated ruff, removed black, formatted

03bdfff

Fix enum instantiation by name

4f6f7fb

Set default num_experiments to 1 in hl scripts

ff27ffd

Added a script for benchmarking

91029ad

MischaPanch force-pushed the benchmarking branch from f28c819 to 91029ad Compare October 23, 2025 10:25

Merge branch 'dev-v2' into benchmarking

7b4588d

Conflicts: poetry.lock pyproject.toml tianshou/highlevel/experiment.py

opcode81 force-pushed the benchmarking branch from c791897 to 7b4588d Compare October 24, 2025 14:22

opcode81 changed the base branch from master to dev-v2 October 24, 2025 14:24

MischaPanch added 7 commits October 24, 2025 19:42

Minor post-merge cleanup

e47367d

Added result aggregation to benchmarking

bd2f827

This form is expected by the visualization script

Extend benchmarking to run for all desired tasks

6c10201

Minor fixes in typing

e9afaeb

Removed no longer needed mujoco_ppo_multi.py

203ea48

We will include a separate example in a readme, no need for a script

Refactored mujoco low-level examples to use jsonargparse

b47ca52

Refactored atari low-level examples to use jsonargparse

8cf6576

MischaPanch force-pushed the benchmarking branch from 9f4b8c9 to 8cf6576 Compare October 25, 2025 12:28

MischaPanch added 2 commits October 25, 2025 15:10

Reinstating parameterization of v0.5.0 in mujoco hl scripts

91f6030

Renamed train_envs to training_envs

457b82e

MischaPanch changed the base branch from dev-v2 to master October 25, 2025 13:39

More parameterization in hl scripts, used in benchmarking

a07831f

MischaPanch changed the title ~~Benchmarking - part 1~~ Benchmarking Oct 25, 2025

MischaPanch added 3 commits October 27, 2025 14:06

Removed obsolete result aggregation script

6017fea

Bumped epochs for off-policy algos and switched launcher to joblib

b7e8b93

The bottleneck for off-policy algos is not rollout but training, so it makes sense to parallelize experiments by default

Automatically set test_step_num_episodes to num_test_envs by default

2bcb29f

MischaPanch force-pushed the benchmarking branch from fedf01f to c3cf417 Compare October 27, 2025 14:41

Benchmark script: minor improvement in tmux session counting

fcd48fc

MischaPanch force-pushed the benchmarking branch from c3cf417 to fcd48fc Compare October 27, 2025 14:45

MischaPanch added 7 commits October 29, 2025 13:30

More renamings of type train -> training

8590c7c

More renamings of type train -> training

770334d

Bugfix: passing save_interval in create_logger

75fc43c

Minor restructuring, improved defaults and docstrings in loggers

151ae1f

In particular, changed save_interval default to None so we don't save the policy after each epoch

Minor restructuring, improved defaults and docstrings in loggers

38c4b93

In particular, changed save_interval default to None so we don't save the policy after each epoch

Less invasive logging on training

f368fe7

Configurable experiment launcher in run_benchmark.py

51493eb

MischaPanch force-pushed the benchmarking branch from 9d3853b to 51493eb Compare October 29, 2025 15:00

Minor

ae19d79

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmarking #1276

Benchmarking #1276

Uh oh!

MischaPanch commented Oct 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Benchmarking #1276

Are you sure you want to change the base?

Benchmarking #1276

Uh oh!

Conversation

MischaPanch commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MischaPanch commented Oct 16, 2025 •

edited

Loading