Carview!

🚀 Exploratory Annealed Decoding (EAD)

We're excited to present our latest research contribution: Exploratory Annealed Decoding (EAD) for Verifiable Reinforcement Learning!

📄 Research Paper
🌐 Project Website

Our codebase is based on verl RL framework and vLLM inference engine. We made our best efforts to save the commit history so everyone can easily find what we have updated. We are working to integrate our methods to both upstreams. Stay tuned!

Citation

If you use this code as part of any published research, please acknowledge the following paper:

@article{yang2025ead,
  title={Let it Calm: Exploratory Annealed Decoding for Verifiable Reinforcement Learning},
  author={Yang, Chenghao and Gui, Lin and Yang, Chenxiao and Veitch, Victor and Zhang, Lizhu and Zhao, Zhuokai},
  journal={arXiv preprint arXiv:2510.05251},
  year={2025}
}

What is EAD?

EAD is a simple yet effective exploration strategy for Reinforcement Learning with Verifiable Rewards (RLVR) that addresses a fundamental challenge: achieving effective exploration while preserving sample quality and ensuring training stability.

Core Insight: Exploration is not equally valuable at every step. Early tokens shape a sequence's semantic direction, making early exploration crucial for discovering diverse valid solutions. Later tokens fill in details where excessive exploration can harm coherence.

Our Strategy: Explore at the beginning, exploit at the end

EAD implements an intuitive temperature annealing schedule that:

Starts with high temperature (τ > 1) to encourage diverse exploration of solution paths
Gradually cools to lower temperatures to ensure coherent, high-quality completions
Maintains proximity to the target policy for stable off-policy learning

Mathematical Formulation

EAD uses a dynamic temperature schedule that starts high and gradually decreases:

$$\tau_t = \max{1 + \tau_\mathrm{max} - e^{t/d}, \tau_\mathrm{min}}$$

Where:

$\tau_t$ is the temperature at token position $t$
$\tau_\mathrm{max} > 1$ is the maximum temperature for exploration
$\tau_\mathrm{min}$ is the minimum temperature for exploitation
$d$ is the decay rate controlling annealing speed

The decay rate is made global-step-aware to adapt to increasing response lengths:

$$d_s = \min(d_0 + 5s, 40000)$$

Where $s$ is the training step, ensuring the annealing schedule scales with model capabilities.

Key Benefits

Plug-and-Play Enhancement: Improves sample efficiency over fixed-temperature sampling
Broad Compatibility: Works with various RLVR algorithms (GRPO, DAPO, EntropyMech)
Sample Efficient: Achieves strong results with fewer rollouts
Inference-Time Benefits: Also improves generation quality at test time
Mitigates Entropy Collapse: Helps escape local optima during training plateaus

Key Results

Note: For the best viewing experience, please visit our research website to see all figures in high resolution.

Figure 1: Annealing Schedule

Shows how different decay rates d affect the temperature schedule
A larger d slows the cooling, front-loading exploration over more tokens
View full figure

Figure 2: Performance Results

Pass@16 and Worst@16 evaluation in RL training
EAD significantly improves exploration of high-quality samples
View full figure

Figure 3: Entropy Dynamics

EAD mitigates entropy collapse by maintaining exploration throughout training
Helps escape local optima during plateau stages
View full figure

Figure 4: Algorithm Compatibility

EAD works with various RL algorithms (GRPO, EntropyMech)
Consistently outperforms fixed-temperature sampling
View full figure

Try EAD

Check out the implementation in recipe/ead/ and explore the research website for detailed results and visualizations.

Quick Start with EAD:

# Follow Minimal-RL to prepare the dataset
# Then run EAD with annealed sampling
cd recipe/ead
bash run_annealed_sampling.sh

Name		Name	Last commit message	Last commit date
Latest commit History 963 Commits
.gemini		.gemini
.github		.github
.vscode		.vscode
docker		docker
docs		docs
examples		examples
figures		figures
recipe		recipe
scripts		scripts
tests		tests
verl		verl
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements-npu.txt		requirements-npu.txt
requirements.txt		requirements.txt
requirements_sglang.txt		requirements_sglang.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Exploratory Annealed Decoding (EAD)

Citation

What is EAD?

Mathematical Formulation

Key Benefits

Key Results

Try EAD

About

Uh oh!

Releases

Packages

Contributors 256

Uh oh!

Languages

License

yangalan123/EAD-RLVR

Folders and files

Latest commit

History

Repository files navigation

🚀 Exploratory Annealed Decoding (EAD)

Citation

What is EAD?

Mathematical Formulation

Key Benefits

Key Results

Try EAD

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 256

Uh oh!

Languages

Packages