rLLM

🚀 Reinforcement Learning for Language Agents🌟

rLLM is an open-source framework for post-training language agents via reinforcement learning. With rLLM, you can easily build your custom agents and environments, train them with reinforcement learning, and deploy them for real-world workloads.

Releases 📰

[2025/07/01] We release DeepSWE-Preview, a 32B software engineering agent (SWE) trained with purely RL that achieves 59% on SWEBench-Verified with test-time scaling,(42.2% Pass@1), topping the SWEBench leaderboard for open-weight models.

🍽️ An In-Depth Blog Post on our SWE Agents and RL Training Recipes
🤗 HF Model DeepSWE-Preview
🤗 HF Dataset R2E-Gym-Subset
📄 Training Scripts
📈 Wandb Training Logs—All training runs and ablations.
🔎 Evaluation Logs—16 passes over SWE-Bench-Verified.

[2025/04/08] We release DeepCoder-14B-Preview, a 14B coding model that achieves an impressive 60.6% Pass@1 accuracy on LiveCodeBench (+8% improvement), matching the performance of o3-mini-2025-01-031 (Low) and o1-2024-12-17.

⬆️ An In-Depth Blog Post on our Training Recipe and Insights
🤗 HF Model DeepCoder-14B-Preview, DeepCoder-1.5B-Preview
🤗 HF Dataset DeepCoder-Preview-Dataset
📄 Training Scripts—Exact hyperparameters we used to achieve o3-mini performance.
📈 Wandb Training Logs—All training runs and ablations.
🔎 Evaluation Logs—LiveCodeBench and Codeforces logs for DeepCoder.

[2025/02/10] We release DeepScaleR-1.5B-Preview, a 1.5B model that surpasses O1-Preview and achieves 43.1% Pass@1 on AIME. We achieve this by iteratively scaling Deepseek's GRPO algorithm from 8K→16K->24K context length for thinking.

🍗 An In-Depth Blog Post on our Training Recipe and Insights
🤗 HF Model DeepScaleR-1.5B-Preview
🤗 HF Dataset DeepScaleR-Preview-Dataset / 🗂️ JSON Dataset
📄 Training Scripts—Exact hyperparameters we used to achieve 43.1% on AIME.
📈 Wandb Training Logs—All training runs and ablations.
- Due to Wandb migration bugs, the 8k training run is compressed to 400-500 steps. The data is identical, but our original run was 1600 steps.
🔎 Evaluation Logs—DeepScaleR, Deepseek Distill, and Still 1.5B generations over 1000+ math problems.

Getting Started 🎯

Installation

# Clone the repository
git clone --recurse-submodules https://github.com/agentica-project/rllm.git
cd rllm
# create a conda environment
conda create -n rllm python=3.10
conda activate rllm
# Install all dependencies
pip install -e ./verl
pip install -e .

Acknowledgements

Our training experiments are powered by our heavily modified fork of verl, an open-source RLHF library.
Our models are trained on top of DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-14B, and Qwen3-32B.
Our work is done as part of Berkeley Sky Computing Lab, Berkeley AI Research, and a successful collaboration with Together AI.

Citation

Citing rLLM:

@misc{rllm2025,
  title={rLLM: A Framework for Post-Training Language Agents},
  author={Sijun Tan and Michael Luo and Colin Cai and Tarun Venkat and Kyle Montgomery and Aaron Hao and Tianhao Wu and Arnav Balyan and Manan Roongta and Chenguang Wang and Li Erran Li and Raluca Ada Popa and Ion Stoica},
  year={2025},
  howpublished={\url{https://pretty-radio-b75.notion.site/rLLM-A-Framework-for-Post-Training-Language-Agents-21b81902c146819db63cd98a54ba5f31}},
  note={Notion Blog}
  year={2025}
}

Citing DeepSWE:

@misc{deepswe2025,
  title={DeepSWE: Training a State-of-the-Art Coding Agent from Scratch by Scaling RL},
  author={Michael Luo and Naman Jain and Jaskirat Singh and Sijun Tan and Ameen Patel and Qingyang Wu and Alpay Ariyak and Colin Cai and Tarun Venkat and Shang Zhu and Ben Athiwaratkun and Manan Roongta and Ce Zhang and Li Erran Li and Raluca Ada Popa and Koushik Sen and Ion Stoica},
  howpublished={\url{https://pretty-radio-b75.notion.site/DeepSWE-Training-a-Fully-Open-sourced-State-of-the-Art-Coding-Agent-by-Scaling-RL-22281902c1468193aabbe9a8c59bbe33}},
  note={Notion Blog},
  year={2025}
}

Citing DeepCoder:

@misc{deepcoder2025,
  title={DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level},
  author={Michael Luo and Sijun Tan and Roy Huang and Ameen Patel and Alpay Ariyak and Qingyang Wu and Xiaoxiang Shi and Rachel Xin and Colin Cai and Maurice Weber and Ce Zhang and Li Erran Li and Raluca Ada Popa and Ion Stoica},
  howpublished={\url{https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51}},
  note={Notion Blog},
  year={2025}
}

Citing DeepScaleR:

@misc{deepscaler2025,
  title={DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL},
  author={Michael Luo and Sijun Tan and Justin Wong and Xiaoxiang Shi and William Y. Tang and Manan Roongta and Colin Cai and Jeffrey Luo and Li Erran Li and Raluca Ada Popa and Ion Stoica},
  year={2025},
  howpublished={\url{https://pretty-radio-b75.notion.site/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2}},
  note={Notion Blog}
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,199 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
rllm		rllm
scripts		scripts
tests		tests
verl @ 1a39843		verl @ 1a39843
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
build_docs.sh		build_docs.sh
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

rLLM

Releases 📰

Getting Started 🎯

Installation

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 20

Languages

License

agentica-project/rllm

Folders and files

Latest commit

History

Repository files navigation

rLLM

Releases 📰

Getting Started 🎯

Installation

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 20

Languages

Packages