FAVOR-Bench

A Comprehensive Benchmark for Fine-Grained Video Motion Understanding

🔥 News

2025.09.18 🎉 FAVOR-Bench has been accepted by NeurIPS 2025 Datasets and Benchmarks Track!
2025.03.19 🌟 We released Favor-Bench, a new benchmark for fine-grained video motion understanding that spans both ego-centric and third-person perspectives with comprehensive evaluation including both close-ended QA tasks and open-ended descriptive tasks!

Introduction

Multimodal Large Language Models (MLLMs) have shown impressive video content understanding capabilities but struggle with fine-grained motion comprehension. To comprehensively assess the motion understanding ability of existing MLLMs, we introduce FAVOR-Bench, which comprises 1,776 videos from both ego-centric and third-person perspectives and enables assessment through both close-ended and open-ended tasks. For close-ended evaluation, we carefully design 8,184 multiple-choice question-answer pairs spanning six distinct sub-tasks. For open-ended evaluation, we employ the GPT-assisted evaluation and develop a novel cost-efficient LLM-free assessment method, where the latter can enhance benchmarking interpretability and accessibility. Comprehensive experiments with 21 state-of-the-art MLLMs reveal significant limitations in their ability to comprehend and describe detailed temporal dynamics in video motions. To alleviate this limitation, we further build FAVOR-Train, a dataset of 17,152 videos with fine-grained motion annotations. Finetuning Qwen2.5-VL on FAVOR-Train yields consistent improvements on motion-related tasks across TVBench, MotionBench and our FAVOR-Bench. Our assessment results demonstrate that the proposed FAVOR-Bench and FAVOR-Train provide valuable tools for the community to develop more powerful video understanding models.

Evaluation Tasks

Evaluate

License

Our dataset is under the CC-BY-NC-SA-4.0 license.

If you need to access and use our dataset, you must understand and agree: This dataset is for research purposes only and cannot be used for any commercial or other purposes. The user assumes all effects arising from any other use and dissemination.

We do not own the copyright of any raw video files. Currently, we provide video access to researchers under the condition of acknowledging the above license. For the video data used, we respect and acknowledge any copyrights of the video authors. Therefore, for the TV series and animations used in the dataset, we have applied several preprocessing steps to minimize any potential impact on the original copyrights. These include reducing video resolution, segmenting videos into short clips (less than 10 seconds), and applying dimension adjustments.

If there is any infringement in FAVOR-Bench, please contact zhangl22@m.fudan.edu.cn or directly raise an issue. If necessary, we will replace the contested videos with sparsely sampled frame sets at adjusted resolutions. In cases where even frame retention is not permitted, we will maintain the annotation files while replacing the video content with meta-information or seeking alternative sources that are more reliable and risk-free.

Close-ended tasks

We give the example of evaluating Qwen2.5-VL on the close-ended tasks of FAVOR-Bench as follows:

Download the FAVOR-Bench videos and put all the mp4 files in one directory (for example, ./test_videos)
Install the required dependencies and download checkpoints following the official repo.
Run the inference code:

python inference_qa_qwen.py

Then the results will be written to a jsonl file in ./output_qa/ and the scores will be printed.

LLM-free evaluation

We give the example of LLM-free evaluation as follows:

Prepare the necessary environments. nltk and sentence-transformers are needed

pip install sentence-transformers nltk

Enter the folder

cd LLM-free

Run the LLM-free_step1_extract.ipynb notebook, then extract results will be generated.
Run the compare code, then the scores will be generated.

python LLM-free_step2_compare.py

📈 Results

Model Comparision:

Benchmark Comparison:

Benchmark Statistics:

Data statistics of FAVOR-Bench. Left: Task type distribution across close-ended and open-ended evaluation in FAVOR-Bench. Middle: Distribution of motion numbers (motion sequence length) per video. Right: The word cloud statistics of motion vocabularies in FAVOR-Bench.

More data statistics of FAVOR-Bench. Left: Index distribution of correct answers for the close-ended tasks. For example, "(1)" indicates that the correct option is ranked first. Middle: Video duration distribution of FAVOR-Bench. Right: Question number distribution for videos of FAVOR-Bench.

Citation

If you find our work helpful for your research, please consider citing our work.

@misc{tu2025favor,
      title={FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding},
      author={Chongjun Tu and Lin Zhang and Pengtao Chen and Peng Ye and Xianfang Zeng and Wei Cheng and Gang Yu and Tao Chen},
      year={2025},
      eprint={2503.14935},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
LLM-free		LLM-free
docs		docs
LICENSE		LICENSE
README.md		README.md
analyze_qa.py		analyze_qa.py
inference_qa_qwen.py		inference_qa_qwen.py
video_perspective.json		video_perspective.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FAVOR-Bench

A Comprehensive Benchmark for Fine-Grained Video Motion Understanding

🔥 News

Introduction

Evaluation Tasks

Evaluate

License

Close-ended tasks

LLM-free evaluation

📈 Results

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

FAVOR-Bench/FAVOR-Bench

Folders and files

Latest commit

History

Repository files navigation

FAVOR-Bench

A Comprehensive Benchmark for Fine-Grained Video Motion Understanding

🔥 News

Introduction

Evaluation Tasks

Evaluate

License

Close-ended tasks

LLM-free evaluation

📈 Results

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages