MME-Reasoning 🔥: A Comprehensive Benchmark for Logical Reasoning in MLLMs

Official repository for "MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs".

🌟 For more details, please refer to the project page.

[🚀Project Page] [📖 Paper] [📊 Huggingface Dataset] [🏆 Leaderboard]

💥 News

[2025.06.17] 🔥 We have integrated MME-Reasoning in VLMEvalkit.
[2025.05.23] 🔥 We launch MME-Reasoning, a comprehensive benchmark designed to evaluate the reasoning ability of MLLMs. We release the arxiv paper and all data samples in huggingface dataset.

👀 About MME-Reasoning

Logical reasoning is a fundamental aspect of human intelligence and an essential capability for multimodal large language models (MLLMs). Existing benchmarks fail to comprehensively evaluate MLLMs reasoning abilities due to the lack of explicit categorization for logical reasoning types and an unclear understanding of reasoning.

In this paper, we introduce MME-Reasoning, a comprehensive benchmark specifically designed to evaluate the reasoning capability of MLLMs. MME-Reasoning consists of 1,188 carefully curated questions that systematically cover types of logical reasoning (inductive, deductive, and abductive), while spanning a range of difficulty levels.

Experiments were conducted on state-of-the-art MLLMs, covering Chat and Thinking types of both open-source and closed-source. Evaluations with MME-Reasoning reveal these key findings: (1) MLLMs exhibit significant limitations and pronounced imbalances in reasoning capabilities. (2) Abductive reasoning remains a major bottleneck for current MLLMs. (3) Reasoning length scales with task difficulty, benefiting performance but accompanied by marginal effects and decreasing token efficiency. We hope MME-Reasoning serves as a foundation for advancing multimodal reasoning in MLLMs.

Inference using VLMEvalkit

Please first install VLMEvalKit following the official github repo.

Then, run:

python run.py --data MME-Reasonig --model TESTED_MODEL --verbose

Inference

We are working to integrate the MME-Reasoning into existing VLMs evaluation frameworks. For the current version of the evaluation, please following the follows steps:

Setup your environment following VLMEvalKit
Download MME-Reasoning data and metadata from huggingface.
Set environment variable LMUData (note the images should exist under $LMUDATA/MMEReasoning/images/)
Set the metadata path in vlmeval/dataset/mmereasoning/mmereasoning.py in line 19 and line 25.

Run:

python run.py --data MMEReasoning --model your_model --mode infer --verbose

Extract and judge the final results:
```
python test_mme_reasoning.py --file_path response_file
```
The response file exists in outputs dir and ends with scores.xlsx.

🏆 Leaderboard

Contributing to the Leaderboard

🚀 The Leaderboard is continuously being updated, welcoming the contribution of your excellent MLLMs!

To contribute your model to the leaderboard, please email the prediction files to 📧jkyuan112@gmail.com or pengts521@gmail.com.

✅ Citation

If you find MME-Reasoning useful for your research and applications, please kindly cite using this BibTeX:

@article{yuan2025mme,
  title={MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs},
  author={Yuan, Jiakang and Peng, Tianshuo and Jiang, Yilei and Lu, Yiting and Zhang, Renrui and Feng, Kaituo and Fu, Chaoyou and Chen, Tao and Bai, Lei and Zhang, Bo and others},
  journal={arXiv preprint arXiv:2505.21327},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,311 Commits
assets		assets
docs		docs
requirements		requirements
scripts		scripts
vlmeval		vlmeval
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
README_VLMEVAL.md		README_VLMEVAL.md
requirements.txt		requirements.txt
run.py		run.py
setup.py		setup.py
test_reasoning.py		test_reasoning.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MME-Reasoning 🔥: A Comprehensive Benchmark for Logical Reasoning in MLLMs

💥 News

👀 About MME-Reasoning

Inference using VLMEvalkit

Inference

🏆 Leaderboard

Contributing to the Leaderboard

✅ Citation

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

Alpha-Innovator/MME-Reasoning

Folders and files

Latest commit

History

Repository files navigation

MME-Reasoning 🔥: A Comprehensive Benchmark for Logical Reasoning in MLLMs

💥 News

👀 About MME-Reasoning

Inference using VLMEvalkit

Inference

🏆 Leaderboard

Contributing to the Leaderboard

✅ Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages