Exporters From Japan
Wholesale exporters from Japan   Company Established 1983
CARVIEW
Select Language

RLAD Framework

Workflow

RLAD jointly trains:

  • Abstraction Generator – proposes problem-specific abstractions.
  • Solution Generator – learns to solve problems by leveraging abstractions.

Training proceeds in two phases:

  • Warm-start with supervised fine-tuning on abstraction – solution pairs from stronger models.
  • Reinforcement learning where abstractions are rewarded if they improve the success rate of solution generation.

Experimental Results

Main Performance Results on Math Reasoning Benchmarks

Approach AIME 2025 DeepScaleR [Hard] AMC 2023
w/o abs (avg) w/ abs (avg) w/ abs (best) w/o abs (avg) w/ abs (avg) w/ abs (best) w/o abs (avg) w/ abs (avg) w/ abs (best)
Qwen-3-1.7B 33.75 36.25 40.00 20.21 22.14 32.50 86.41 78.01 84.53
+ DAPO 37.92 34.90 39.79 21.67 21.88 33.54 86.41 81.99 88.44
+ RLAD 38.04 42.45 48.33 23.54 24.84 35.54 87.25 88.35 91.72

Table: Accuracy on math reasoning benchmarks. RLAD achieves consistent gains in both abstraction-conditioned and w/o abstraction settings across AIME 2025, DeepScaleR Hard, and AMC 2023. We report performance without abstractions, with abstractions (pass@1 with 16 samples), and the best abstraction (pass@16).

A typical example of a reasoning abstraction proposed by our abstraction generator.

Workflow

Figure: In the solution, we see references to the abstraction and keywords from the abstraction being used meaningfully in the reasoning trace of the solution generator model.