| CARVIEW |
Select Language
RLAD Framework
RLAD jointly trains:
- Abstraction Generator – proposes problem-specific abstractions.
- Solution Generator – learns to solve problems by leveraging abstractions.
Training proceeds in two phases:
- Warm-start with supervised fine-tuning on abstraction – solution pairs from stronger models.
- Reinforcement learning where abstractions are rewarded if they improve the success rate of solution generation.
Experimental Results
Main Performance Results on Math Reasoning Benchmarks
| Approach | AIME 2025 | DeepScaleR [Hard] | AMC 2023 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| w/o abs (avg) | w/ abs (avg) | w/ abs (best) | w/o abs (avg) | w/ abs (avg) | w/ abs (best) | w/o abs (avg) | w/ abs (avg) | w/ abs (best) | |
| Qwen-3-1.7B | 33.75 | 36.25 | 40.00 | 20.21 | 22.14 | 32.50 | 86.41 | 78.01 | 84.53 |
| + DAPO | 37.92 | 34.90 | 39.79 | 21.67 | 21.88 | 33.54 | 86.41 | 81.99 | 88.44 |
| + RLAD | 38.04 | 42.45 | 48.33 | 23.54 | 24.84 | 35.54 | 87.25 | 88.35 | 91.72 |
Table: Accuracy on math reasoning benchmarks. RLAD achieves consistent gains in both abstraction-conditioned and w/o abstraction settings across AIME 2025, DeepScaleR Hard, and AMC 2023. We report performance without abstractions, with abstractions (pass@1 with 16 samples), and the best abstraction (pass@16).
A typical example of a reasoning abstraction proposed by our abstraction generator.
Figure: In the solution, we see references to the abstraction and keywords from the abstraction being used meaningfully in the reasoning trace of the solution generator model.