2025.09.18🎉 FAVOR-Bench has been accepted by NeurIPS 2025 Datasets and Benchmarks Track!2025.03.19🌟 We released Favor-Bench, a new benchmark for fine-grained video motion understanding that spans both ego-centric and third-person perspectives with comprehensive evaluation including both close-ended QA tasks and open-ended descriptive tasks!
Multimodal Large Language Models (MLLMs) have shown impressive video content understanding capabilities but struggle with fine-grained motion comprehension. To comprehensively assess the motion understanding ability of existing MLLMs, we introduce FAVOR-Bench, which comprises 1,776 videos from both ego-centric and third-person perspectives and enables assessment through both close-ended and open-ended tasks. For close-ended evaluation, we carefully design 8,184 multiple-choice question-answer pairs spanning six distinct sub-tasks. For open-ended evaluation, we employ the GPT-assisted evaluation and develop a novel cost-efficient LLM-free assessment method, where the latter can enhance benchmarking interpretability and accessibility. Comprehensive experiments with 21 state-of-the-art MLLMs reveal significant limitations in their ability to comprehend and describe detailed temporal dynamics in video motions. To alleviate this limitation, we further build FAVOR-Train, a dataset of 17,152 videos with fine-grained motion annotations. Finetuning Qwen2.5-VL on FAVOR-Train yields consistent improvements on motion-related tasks across TVBench, MotionBench and our FAVOR-Bench. Our assessment results demonstrate that the proposed FAVOR-Bench and FAVOR-Train provide valuable tools for the community to develop more powerful video understanding models.
Our dataset is under the CC-BY-NC-SA-4.0 license.
If you need to access and use our dataset, you must understand and agree: This dataset is for research purposes only and cannot be used for any commercial or other purposes. The user assumes all effects arising from any other use and dissemination.
We do not own the copyright of any raw video files. Currently, we provide video access to researchers under the condition of acknowledging the above license. For the video data used, we respect and acknowledge any copyrights of the video authors. Therefore, for the TV series and animations used in the dataset, we have applied several preprocessing steps to minimize any potential impact on the original copyrights. These include reducing video resolution, segmenting videos into short clips (less than 10 seconds), and applying dimension adjustments.
If there is any infringement in FAVOR-Bench, please contact zhangl22@m.fudan.edu.cn or directly raise an issue. If necessary, we will replace the contested videos with sparsely sampled frame sets at adjusted resolutions. In cases where even frame retention is not permitted, we will maintain the annotation files while replacing the video content with meta-information or seeking alternative sources that are more reliable and risk-free.
We give the example of evaluating Qwen2.5-VL on the close-ended tasks of FAVOR-Bench as follows:
- Download the FAVOR-Bench videos and put all the mp4 files in one directory (for example,
./test_videos) - Install the required dependencies and download checkpoints following the official repo.
- Run the inference code:
python inference_qa_qwen.py
Then the results will be written to a jsonl file in ./output_qa/ and the scores will be printed.
We give the example of LLM-free evaluation as follows:
- Prepare the necessary environments. nltk and sentence-transformers are needed
pip install sentence-transformers nltk
- Enter the folder
cd LLM-free
- Run the
LLM-free_step1_extract.ipynbnotebook, then extract results will be generated. - Run the compare code, then the scores will be generated.
python LLM-free_step2_compare.py
- Model Comparision:
- Benchmark Comparison:
- Benchmark Statistics:
If you find our work helpful for your research, please consider citing our work.
@misc{tu2025favor,
title={FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding},
author={Chongjun Tu and Lin Zhang and Pengtao Chen and Peng Ye and Xianfang Zeng and Wei Cheng and Gang Yu and Tao Chen},
year={2025},
eprint={2503.14935},
archivePrefix={arXiv},
primaryClass={cs.CV}
}




