You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repo contains the evaluation code for the paper "Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models".
Introduction
We introduce MMIR, the first benchmark for evaluating Multimodal Large Language Models (MLLMs) on detecting and reasoning about inconsistencies in layout-rich multimodal content. MMIR features 534 challenging samples across five reasoning-heavy inconsistency categories:
Factual Contradiction, Identity Misattribution, Contextual Mismatch, Quantitative Discrepancy and Temporal/Spatial Incoherence.
Dataset Creation
The MMIR benchmark was meticulously constructed through a four-stage curation pipeline to ensure high-quality, diverse, and challenging test cases. Please refer to our huggingface 🤗 Dataset for more details.
@misc{yan2025multimodalinconsistencyreasoningmmir,
title={Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models},
author={Qianqi Yan and Yue Fan and Hongquan Li and Shan Jiang and Yang Zhao and Xinze Guan and Ching-Chen Kuo and Xin Eric Wang},
year={2025},
eprint={2502.16033},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2502.16033},
}
About
[ACL 2025 Findings] "Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models"