This is a preliminary implementation of the paper "Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team". More tasks and settings will be released soon. You may see some additional Xolver logs here.
Md Tanzib Hosain, Salman Rahman, Md Kishor Morol, Md Rizwan Parvez
- š¢ We have added code (unlceaned) for more additional agentic benchmarks including 2wiki, Bamboogle, BrowseComp, GAIA, GPQA, Humanities Last Exam (HLE), OS-World, and SQUAD datasets in the test brunch. We will include the SWE-Bench soon. On all of these benchamrks, Xolver achives a new set of SoTA results.
To clone the project, run:
git clone https://github.com/kagnlp/Xolver && cd Xolver
To prepare a new conda or python virtual environment, run:
pip install -r requirements.txt
The code for running GSM, AIME, MATH and LiveCodeBench tasks may be found in the following subfolders
- ./gsm/ contains code for running GSM
- ./aime/ contains code for running AIME
- ./math/ contains code for running MATH
- ./lcb/ contains code for running LiveCodeBench results
GSM:
To prepare GSM with API key, change here
To prepare GSM with self-retrieval, here should be empty
To prepare GSM with no retrieval, change here to "None" followed by the above statement
To generate and evaluated answer for GSM problems through Xolver, run:
cd ./gsm python gsm.py
AIME:
To prepare AIME with API key, change here
To prepare AIME with self-retrieval, here should be empty
To prepare AIME with no retrieval, change here to "None" followed by the above statement
To generate and evaluated answer for AIME problems through Xolver, run:
cd ./aime python aime.py
MATH:
To prepare MATH with API key, change here
To prepare MATH with self-retrieval, here should be empty
To prepare MATH with no retrieval, change here to "None" followed by the above statement
To generate and evaluated answer for MATH problems through Xolver, run:
cd ./math python math.py
LiveCodeBench:
To prepare LiveCodeBench with API key, change here
To prepare LiveCodeBench with self-retrieval, here should be empty
To prepare LiveCodeBench with no retrieval, change here to "None" followed by the above statement
To generate and evaluated answer for LiveCodeBench problems through Xolver, run:
cd ./lcb python lcb.py
Without Memory
To generate and evaluated answer for tasks through Xolver (-) (static retrieval corpus), remove EPISODIC MEMORY and run similarly:
cd ./task python task.py
If you would like to cite the paper, here is a bibtex file:
@article{hosain2025xolver,
title={šolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team},
author={Md Tanzib Hosain and Salman Rahman and Md Kishor Morol and Md Rizwan Parvez},
journal={arXiv preprint},
year={2025}
}