VIDEO-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning

This is the official implementation for Video-RTS.

Authors: Ziyang Wang, Jaehong Yoon, Shoubin Yu, Md Mohaiminul Islam, Gedas Bertasius, Mohit Bansal

University of North Carolina at Chapel Hill

We introduce Video-RTS, a new approach to improve video reasoning capability with drastically improved data efficiency by combining data-efficient RL with a video-adaptive test-time scaling (TTS) strategy.

Installation

git clone https://github.com/Ziyang412/Video-RTS.git
cd Video-RTS
# build environment
conda create -n video-rts python=3.11 
conda activate video-rts
bash setup.sh
# qwen video extraction setting, e.g., max frames, resolutions
# Use the [decord] feature to improve speed
cd src/qwen-vl-utils
pip install -e .[decord]
cd ..

Following Video-R1, please install the provided version of transformers

unzip transformers-main.zip
cd ./transformers-main
pip install .

Download Dataset

Please refer to the official github of each dataset for video downloading.

For evaluation, we provide the annotation file in ./src/r1-v/Evaluation and please refer to the ./src/r1-v/Evaluation/path_coversion.py to update the video path.

For training, we provided the training data annotation in ./src/training_data and please refer to the CG-Bench repo for video data

Download Video-RTS model checkpoint

We provided the model checkpoint in Huggingface, noted that the model is only trained on about 2k samples but yield similar performance with the 6k sample training.

Video-RTS Training

We use the Open-R1-Video as trainig codebased. We provided our modification files in ./src/training_files so please replace the exact same files in the original repo. You could also use the Video-R1 as training codebase, we find the results are similar.

Inference with S2D Video TTS

Please update the input model / file name / output file in the given bash file. After running the inference code, please update the json_path in cal_results_acc.py to calculate the final video reasoning accuracy.

bash src/video_rts_eval.sh
python src/cal_results_acc.py

Acknowledgments

We thank the developers of Open-R1-Video, Video-R1, Qwen-2.5-VL and TRL for their public code release.

Reference

Please cite our paper if you use our models in your works:

@misc{wang2025videortsrethinkingreinforcementlearning,
      title={Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning}, 
      author={Ziyang Wang and Jaehong Yoon and Shoubin Yu and Md Mohaiminul Islam and Gedas Bertasius and Mohit Bansal},
      year={2025},
      eprint={2507.06485},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.06485}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
src		src
LICENSE		LICENSE
README.md		README.md
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VIDEO-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning

Authors: Ziyang Wang, Jaehong Yoon, Shoubin Yu, Md Mohaiminul Islam, Gedas Bertasius, Mohit Bansal

University of North Carolina at Chapel Hill

Installation

Download Dataset

Download Video-RTS model checkpoint

Video-RTS Training

Inference with S2D Video TTS

Acknowledgments

Reference

About

Uh oh!

Releases

Packages

Languages

License

Ziyang412/Video-RTS

Folders and files

Latest commit

History

Repository files navigation

VIDEO-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning

Authors: Ziyang Wang*, Jaehong Yoon*, Shoubin Yu, Md Mohaiminul Islam, Gedas Bertasius, Mohit Bansal

University of North Carolina at Chapel Hill

Installation

Download Dataset

Download Video-RTS model checkpoint

Video-RTS Training

Inference with S2D Video TTS

Acknowledgments

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Authors: Ziyang Wang, Jaehong Yoon, Shoubin Yu, Md Mohaiminul Islam, Gedas Bertasius, Mohit Bansal

Packages