ReXTime is designed to test AI models' temporal reasoning within video events, focusing on understanding cause-and-effect across different video segments, with 921 validation samples and 2,143 test samples.
|π Project Page | πGithub | π€Huggingface Dataset | πLeaderboard | πPaper |
git clone https://github.com/ReXTime/ReXTime.git
cd ReXTime
git clone https://huggingface.co/datasets/ReXTime/ReXTime
- ActivityNet
Download the raw video data from the Download page at ActivityNet official website. You need to fill in their request form to have a 7-day-access to download the videos from the drive folders. You can find the form here.
- QVHighlights
Download raw video data from the link provided by Moment-DETR. Extract the file.
wget https://nlp.cs.unc.edu/data/jielei/qvh/qvhilights_videos.tar.gz
tar -xvzf qvhilights_videos.tar.gz
.
βββ videos/ # Path to the QVHighlights raw videos, can be anywhere.
β βββ 9c_w8HU3hqc_210.0_360.0.mp4 # Video 1
β βββ efCSWDWjm6g_360.0_510.0.mp4 # Video 2
βββ Anet_videos_15fps_short256/ # Path to the ActivityNet raw videos, can be anywhere.
β βββ v_5R3h6lxne90.mp4 # Video 1
β βββ v_aQ-F9wr0HQ4.mp4 # Video 2
βββ ReXTime/ # Code repo
β βββ ReXTime/ # Huggingface dataset repo
β βββ evaluation/ # Evaluation code
β βββ demo/ # Inference demo script
β βββ requirements.txt # Packages for environment
...
conda create --name=rextime python=3.10 -y
conda activate rextime
pip install -r requirements.txt
Here we provide open source model evaluation demo and proprietary models evaluation demo. You need to modify the path to the dataset repo and paths to the directory of two source raw videos in the following scripts. For proprietary models evaluation, you need to fill in your API key.
Open source MLLM demo:
python ./demo/inference.py \
--dataset_path ./ReXTime \
--anet_vid_dir ${Path to the AcrivityNet video directory} \
--qvh_vid_dir ${Paht to the QVHighlights video directory}
Proprietary MLLM demo:
OPENAI_API_KEY="sk-***********************************" python ./demo/request.py \
--dataset_path ./ReXTime \
--anet_vid_dir ${Path to the AcrivityNet video directory} \
--qvh_vid_dir ${Paht to the QVHighlights video directory}
This is an example of output/submission file in .jsonl format. For the assessment of moment grounding, you only need to provide "qid" and "pred_relevant_windows". For the assessment of multi-choice VQA, you only need to provide "qid" and "ans". For the assessment of grounding VQA, you need to provide "qid" "pred_relevant_windows" and "ans" in your submission file. For grounding VQA evaluation, the predicted answer should be conditioned on the predicted time span.
{"qid": "anet_val384", "pred_relevant_windows": [[0.0, 15.8304]], "ans": "A"}
{"qid": "qvh_val114", "pred_relevant_windows": [[0.0, 25.50]], "ans": "A"}
...
Modify the file paths in the following and run:
python ./evaluation/rextime_eval.py \
--submission_path ${submission_path} \
--gt_path ${gt_path} \
--save_path ${save_path}
Here we only provide the ground truth file of validation set in 'data/rextime_val.jsonl'. To access on the test set, please submit the predicted file to ReXTime Leaderboard.
- The evaluation code is build from Moment-detr.
- The inference code is build from Video-LLaVA.
The annotation files are under CC BY-NC-SA 4.0 license. All the code are under MIT license, see LICENSE.
BibTeX:
@article{chen2024rextime,
title={ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos},
author={Chen, Jr-Jen and Liao, Yu-Chien and Lin, Hsi-Che and Yu, Yu-Chu and Chen, Yen-Chun and Wang, Yu-Chiang Frank},
journal={arXiv preprint arXiv:2406.19392},
year={2024}
}
