rLLM is an open-source framework for post-training language agents via reinforcement learning. With rLLM, you can easily build your custom agents and environments, train them with reinforcement learning, and deploy them for real-world workloads.
[2025/07/01] We release DeepSWE-Preview
, a 32B software engineering agent (SWE) trained with purely RL that achieves 59% on SWEBench-Verified with test-time scaling,(42.2% Pass@1), topping the SWEBench leaderboard for open-weight models.
- ๐ฝ๏ธ An In-Depth Blog Post on our SWE Agents and RL Training Recipes
- ๐ค HF Model
DeepSWE-Preview
- ๐ค HF Dataset
R2E-Gym-Subset
- ๐ Training Scripts
- ๐ Wandb Training LogsโAll training runs and ablations.
- ๐ Evaluation Logsโ16 passes over SWE-Bench-Verified.
[2025/04/08] We release DeepCoder-14B-Preview
, a 14B coding model that achieves an impressive 60.6% Pass@1 accuracy on LiveCodeBench (+8% improvement), matching the performance of o3-mini-2025-01-031 (Low)
and o1-2024-12-17
.
- โฌ๏ธ An In-Depth Blog Post on our Training Recipe and Insights
- ๐ค HF Model
DeepCoder-14B-Preview
,DeepCoder-1.5B-Preview
- ๐ค HF Dataset
DeepCoder-Preview-Dataset
- ๐ Training ScriptsโExact hyperparameters we used to achieve
o3-mini
performance. - ๐ Wandb Training LogsโAll training runs and ablations.
- ๐ Evaluation LogsโLiveCodeBench and Codeforces logs for DeepCoder.
[2025/02/10] We release DeepScaleR-1.5B-Preview
, a 1.5B model that surpasses O1-Preview and achieves 43.1% Pass@1 on AIME. We achieve this by iteratively scaling Deepseek's GRPO algorithm from 8Kโ16K->24K context length for thinking.
- ๐ An In-Depth Blog Post on our Training Recipe and Insights
- ๐ค HF Model
DeepScaleR-1.5B-Preview
- ๐ค HF Dataset
DeepScaleR-Preview-Dataset
/ ๐๏ธ JSON Dataset - ๐ Training ScriptsโExact hyperparameters we used to achieve 43.1% on AIME.
- ๐ Wandb Training LogsโAll training runs and ablations.
- Due to Wandb migration bugs, the 8k training run is compressed to 400-500 steps. The data is identical, but our original run was 1600 steps.
- ๐ Evaluation LogsโDeepScaleR, Deepseek Distill, and Still 1.5B generations over 1000+ math problems.
# Clone the repository
git clone --recurse-submodules https://github.com/agentica-project/rllm.git
cd rllm
# create a conda environment
conda create -n rllm python=3.10
conda activate rllm
# Install all dependencies
pip install -e ./verl
pip install -e .
- Our training experiments are powered by our heavily modified fork of verl, an open-source RLHF library.
- Our models are trained on top of
DeepSeek-R1-Distill-Qwen-1.5B
,DeepSeek-R1-Distill-Qwen-14B
, andQwen3-32B
. - Our work is done as part of Berkeley Sky Computing Lab, Berkeley AI Research, and a successful collaboration with Together AI.
Citing rLLM:
@misc{rllm2025,
title={rLLM: A Framework for Post-Training Language Agents},
author={Sijun Tan and Michael Luo and Colin Cai and Tarun Venkat and Kyle Montgomery and Aaron Hao and Tianhao Wu and Arnav Balyan and Manan Roongta and Chenguang Wang and Li Erran Li and Raluca Ada Popa and Ion Stoica},
year={2025},
howpublished={\url{https://pretty-radio-b75.notion.site/rLLM-A-Framework-for-Post-Training-Language-Agents-21b81902c146819db63cd98a54ba5f31}},
note={Notion Blog}
year={2025}
}
Citing DeepSWE:
@misc{deepswe2025,
title={DeepSWE: Training a State-of-the-Art Coding Agent from Scratch by Scaling RL},
author={Michael Luo and Naman Jain and Jaskirat Singh and Sijun Tan and Ameen Patel and Qingyang Wu and Alpay Ariyak and Colin Cai and Tarun Venkat and Shang Zhu and Ben Athiwaratkun and Manan Roongta and Ce Zhang and Li Erran Li and Raluca Ada Popa and Koushik Sen and Ion Stoica},
howpublished={\url{https://pretty-radio-b75.notion.site/DeepSWE-Training-a-Fully-Open-sourced-State-of-the-Art-Coding-Agent-by-Scaling-RL-22281902c1468193aabbe9a8c59bbe33}},
note={Notion Blog},
year={2025}
}
Citing DeepCoder:
@misc{deepcoder2025,
title={DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level},
author={Michael Luo and Sijun Tan and Roy Huang and Ameen Patel and Alpay Ariyak and Qingyang Wu and Xiaoxiang Shi and Rachel Xin and Colin Cai and Maurice Weber and Ce Zhang and Li Erran Li and Raluca Ada Popa and Ion Stoica},
howpublished={\url{https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51}},
note={Notion Blog},
year={2025}
}
Citing DeepScaleR:
@misc{deepscaler2025,
title={DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL},
author={Michael Luo and Sijun Tan and Justin Wong and Xiaoxiang Shi and William Y. Tang and Manan Roongta and Colin Cai and Jeffrey Luo and Li Erran Li and Raluca Ada Popa and Ion Stoica},
year={2025},
howpublished={\url{https://pretty-radio-b75.notion.site/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2}},
note={Notion Blog}
year={2025}
}