You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
14/04/2025: We release our code, models and data. Paper will be available soon.
14/04/2025: Within 7B PRMs, our model sail/ActPRM-X (based on Qwen/Qwen2.5-Math-PRM-7B) achieved new SOTA performance on ProcessBench (76.0%) and PRMBench (66.7%).
🏴 Overview
TL;DR: We achieved SOTA performance on ProcessBench (75.0%) and PRMBench (65.5%) with merely 5% labeling cost compared with Qwen/Qwen2.5-Math-PRM-7B.
📊 Results
ProcessBenchPRMBench
⚡️ Quickstart
Installation
git clone https://github.com/sail-sg/ActivePRM.git
cd ActivePRM
pip install -e .# tested in conda env where python==3.11
Replication
Evaluate our sail/ActPRM-X and sail/ActPRM on ProcessBench simply by running
cd examples
python py_scripts/test_actprm_on_processbench.py
Training PRM with Active Learning
cd examples
bash scripts/pool_based_active_learning.sh sail/ActPRMData
Citation
If you find our repo or paper helpful, please cite
@misc{duan2025actprm,
title={Efficient Process Reward Model Training via Active Learning},
author={Keyu Duan and Zichen Liu and Xin Mao and Tianyu Pang and Changyu Chen and Qiguang Chen and Michael Qizhe Shieh and Longxu Dou},
year={2025},
eprint={2504.10559},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2504.10559},
}