ReJump: A Tree-Jump Representation for Analyzing and Improving LLM Reasoning

Yuchen Zeng^1,2, Shuibai Zhang^1, Wonjun Kang^3,4, Shutong Wu¹, Lynnix Zou¹, Ying Fan^1,2, Heeju Kim³, Ziqian Lin¹, Jungtaek Kim¹, Hyung Il Koo³, Dimitris Papailiopoulos^1,2, Kangwook Lee^1,5,

^Equal Contribution ¹University of Wisconsin-Madison ²Microsoft Research ³FuriosaAI ⁴Seoul National University ⁵Krafton

Abstract: Large Language Models (LLMs) typically reason via Chain-of-Thought (CoT) prompting or explicit training. Though many LLMs achieve similar accuracy on challenging tasks, such as math problem solving and programming, how their underlying reasoning "algorithms" compare remains poorly understood. To investigate this, we propose ReJump, which represents a reasoning trace as a visitation order over nodes in a tree of intermediate problem-solving steps. ReJump allows tree jumps, non-adjacent transitions between nodes that capture reasoning behaviors such as backtracking, verification, and calculation. This representation enables analyzing LLM reasoning with diverse and intuitive metrics that capture exploration, exploitation, overthinking, forgetting, and verification. We apply ReJump to analyze state-of-the-art Large Reasoning Models (LRMs), which are LLMs explicitly trained for long-form CoTs, and find that models with comparable final accuracy can nonetheless display distinct reasoning behaviors. We further compare distilled LRMs with their teachers, CoT-prompted LLMs with LRMs, and investigate how reasoning examples influence reasoning behavior. Finally, we show that ReJump can improve reasoning quality at test time through strategies such as ReJump-guided Best-of-N selection and prompt selection.

News 🚀

Our paper is available on ArXiv!

Step 1: Set Up Environment

To set up the environment for benchmarking LLMs on meme humor, please follow the following steps. This works for linux.

Clone this repository.

git clone https://github.com/UW-Madison-Lee-Lab/ReJump.git
cd ReJump

Install dependencies.

# create the environment that works for all experiments in our paper
conda env create -f conda_env/liftr.yml
conda activate liftr
pip install -e .

Create environment.py in the liftr directory. Note that many variables need you to config except root_dir on your own

 import os
 root_dir = os.path.dirname(os.path.abspath(__file__))
 
 OPENAI_API_KEY = '<your-openai-api-key>'
 HUGGINGFACE_API_KEY = '<your-huggingface-api-key>'
 ANTHROPIC_API_KEY = '<your-anthropic-api-key>'
 GEMINI_API_KEY = "<your-gemini-api-key>"
 DEEPSEEK_API_KEY = "<your-deepseek-api-key>"
 OPENROUTER_API_KEY = "<your-openrouter-api-key>"
 XAI_API_KEY = "<your-xai-api-key>"
 ALIBABA_API_KEY = "<your-alibaba-api-key>"
 
 HF_HOME = "<path-to-your-hf-home>"
 TRANSFORMERS_CACHE = "<path-to-your-transformers-cache>"
 TRITON_CACHE_DIR="<path-to-your-triton-cache>"
 
 WANDB_INFO = {
     'project': 'liftr',
     'entity': '<your-wandb-entity>'
 }
 
 CONDA_PATH = f"<path-to-your-conda>"

Important: Do not commit this file to version control. This file contains sensitive API keys and should not be synced via GitHub or any other version control system to prevent security risks.

Step 2: Collect LLM Responses on MATH500 and Game of 24

Check constants.py for all supported LLMs.

MATH500

python -m run_exps.create_exps \
--dataset math500 \
--model <model_name> \
--mode reasoning \
--shot 0 \
--n_samples 500 \
--n_query 1 \
--exp_name <exp_name> \
--temperature <temperature> 
bash run_exps/auto/run_all_<exp_name>.sh

Game of 24

python -m run_exps.create_exps \
--dataset game24 \
--model <model_name> \
--mode reasoning \
--shot 0 \
--n_samples 100 \
--n_query 1 \
--exp_name <exp_name> \
--temperature <temperature> 
bash run_exps/auto/run_all_<exp_name>.sh

Sudoku (5*5 Latin Square)

python -m run_exps.create_exps \
--dataset sudoku \
--model <model_name> \
--mode reasoning \
--shot 0 \
--n_samples 100 \
--n_query 1 \
--exp_name <exp_name> \
--temperature <temperature>
bash run_exps/auto/run_all_<exp_name>.sh

Step 3: Perform Reasoning Analysis via ReJump

MATH500

python -m rejump_extractor.tree_vis_math_v3 \
--dataset math500 \
--model_name <model_name> \
--temperature <temperature> \
--num_samples 500 \
--wandb

Game of 24

python -m rejump_extractor.tree_vis_game24 \
--dataset game24 \
--model_name <model_name> \
--temperature <temperature> \
--num_samples 100 \
--wandb

Sudoku (5*5 Latin Square)

python -m rejump_extractor.tree_vis_game24 \
--dataset sudoku \
--model_name <model_name> \
--temperature <temperature> \
--num_samples 100 \
--wandb

Name		Name	Last commit message	Last commit date
Latest commit History 397 Commits
.github/workflows		.github/workflows
analysis		analysis
baselines		baselines
conda_env		conda_env
examples		examples
figures		figures
icl_reasoning		icl_reasoning
imgs		imgs
patches		patches
rejump_extractor		rejump_extractor
run_exps		run_exps
tests		tests
tools		tools
verl		verl
.gitignore		.gitignore
constants.py		constants.py
readme.md		readme.md
requirements.txt		requirements.txt
setup.py		setup.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ReJump: A Tree-Jump Representation for Analyzing and Improving LLM Reasoning

News 🚀

Contents

Step 1: Set Up Environment

Step 2: Collect LLM Responses on MATH500 and Game of 24

MATH500

Game of 24

Sudoku (5*5 Latin Square)

Step 3: Perform Reasoning Analysis via ReJump

MATH500

Game of 24

Sudoku (5*5 Latin Square)

About

Uh oh!

Releases

Packages

Contributors 6

Languages

UW-Madison-Lee-Lab/ReJump

Folders and files

Latest commit

History

Repository files navigation

ReJump: A Tree-Jump Representation for Analyzing and Improving LLM Reasoning

News 🚀

Contents

Step 1: Set Up Environment

Step 2: Collect LLM Responses on MATH500 and Game of 24

MATH500

Game of 24

Sudoku (5*5 Latin Square)

Step 3: Perform Reasoning Analysis via ReJump

MATH500

Game of 24

Sudoku (5*5 Latin Square)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages