Yuchen Zeng*1,2, Shuibai Zhang*1, Wonjun Kang*3,4, Shutong Wu1, Lynnix Zou1, Ying Fan1,2, Heeju Kim3, Ziqian Lin1, Jungtaek Kim1, Hyung Il Koo3, Dimitris Papailiopoulos1,2, Kangwook Lee1,5,
*Equal Contribution 1University of Wisconsin-Madison 2Microsoft Research 3FuriosaAI 4Seoul National University 5Krafton
Abstract: Large Language Models (LLMs) typically reason via Chain-of-Thought (CoT) prompting or explicit training. Though many LLMs achieve similar accuracy on challenging tasks, such as math problem solving and programming, how their underlying reasoning "algorithms" compare remains poorly understood. To investigate this, we propose ReJump, which represents a reasoning trace as a visitation order over nodes in a tree of intermediate problem-solving steps. ReJump allows tree jumps, non-adjacent transitions between nodes that capture reasoning behaviors such as backtracking, verification, and calculation. This representation enables analyzing LLM reasoning with diverse and intuitive metrics that capture exploration, exploitation, overthinking, forgetting, and verification. We apply ReJump to analyze state-of-the-art Large Reasoning Models (LRMs), which are LLMs explicitly trained for long-form CoTs, and find that models with comparable final accuracy can nonetheless display distinct reasoning behaviors. We further compare distilled LRMs with their teachers, CoT-prompted LLMs with LRMs, and investigate how reasoning examples influence reasoning behavior. Finally, we show that ReJump can improve reasoning quality at test time through strategies such as ReJump-guided Best-of-N selection and prompt selection.
- Our paper is available on ArXiv!
- Step 1: Set Up Environment
- Step 2: Collect LLM Responses on MATH500 and Game of 24
- Step 3: Perform Reasoning Analysis via ReJump
To set up the environment for benchmarking LLMs on meme humor, please follow the following steps. This works for linux.
-
Clone this repository.
git clone https://github.com/UW-Madison-Lee-Lab/ReJump.git cd ReJump -
Install dependencies.
# create the environment that works for all experiments in our paper conda env create -f conda_env/liftr.yml conda activate liftr pip install -e .
-
Create
environment.pyin theliftrdirectory. Note that many variables need you to config exceptroot_diron your ownimport os root_dir = os.path.dirname(os.path.abspath(__file__)) OPENAI_API_KEY = '<your-openai-api-key>' HUGGINGFACE_API_KEY = '<your-huggingface-api-key>' ANTHROPIC_API_KEY = '<your-anthropic-api-key>' GEMINI_API_KEY = "<your-gemini-api-key>" DEEPSEEK_API_KEY = "<your-deepseek-api-key>" OPENROUTER_API_KEY = "<your-openrouter-api-key>" XAI_API_KEY = "<your-xai-api-key>" ALIBABA_API_KEY = "<your-alibaba-api-key>" HF_HOME = "<path-to-your-hf-home>" TRANSFORMERS_CACHE = "<path-to-your-transformers-cache>" TRITON_CACHE_DIR="<path-to-your-triton-cache>" WANDB_INFO = { 'project': 'liftr', 'entity': '<your-wandb-entity>' } CONDA_PATH = f"<path-to-your-conda>"
Important: Do not commit this file to version control. This file contains sensitive API keys and should not be synced via GitHub or any other version control system to prevent security risks.
Check constants.py for all supported LLMs.
python -m run_exps.create_exps \
--dataset math500 \
--model <model_name> \
--mode reasoning \
--shot 0 \
--n_samples 500 \
--n_query 1 \
--exp_name <exp_name> \
--temperature <temperature>
bash run_exps/auto/run_all_<exp_name>.shpython -m run_exps.create_exps \
--dataset game24 \
--model <model_name> \
--mode reasoning \
--shot 0 \
--n_samples 100 \
--n_query 1 \
--exp_name <exp_name> \
--temperature <temperature>
bash run_exps/auto/run_all_<exp_name>.shpython -m run_exps.create_exps \
--dataset sudoku \
--model <model_name> \
--mode reasoning \
--shot 0 \
--n_samples 100 \
--n_query 1 \
--exp_name <exp_name> \
--temperature <temperature>
bash run_exps/auto/run_all_<exp_name>.sh
python -m rejump_extractor.tree_vis_math_v3 \
--dataset math500 \
--model_name <model_name> \
--temperature <temperature> \
--num_samples 500 \
--wandbpython -m rejump_extractor.tree_vis_game24 \
--dataset game24 \
--model_name <model_name> \
--temperature <temperature> \
--num_samples 100 \
--wandbpython -m rejump_extractor.tree_vis_game24 \
--dataset sudoku \
--model_name <model_name> \
--temperature <temperature> \
--num_samples 100 \
--wandb