PRInTS: Rewarding Agents for Long-Horizon Information Seeking

Overview

Long-horizon information-seeking tasks require agents to gather and synthesize information across multiple reasoning steps and tool interactions. While process reward models (PRMs) can guide agents by ranking candidate steps at test-time, existing PRMs cannot capture richer dimensions of information-seeking steps nor handle the rapidly growing context in long-horizon tasks. We propose PRInTS (Process Reward via Information gain scoring and Trajectory Summary), a generative PRM jointly trained with two key abilities for fine-grained guidance under the challenge of context accumulation.

🎯 PRInTS as a scorer: evaluates agent's multiple candidate next trajectory steps based on the summarized context and current tool response, and outputs dense scores based on the PRM's reasoning across multiple step quality dimensions (e.g., interpretation of tool outputs, tool call informativeness)
📝 PRInTS as a summarizer: recursively updates a compact information-seeking trajectory summary to keep input length bounded and preserve key information for its subsequent score evaluation.

Install

Please follow the installation instructions from verl.

Data annotation

Our data annotation pipeline is based on Inspect Eval evaluation framework. Please follow the installation isntructions from Inspect Eval. Download the QA corpus from MiroVerse and webagent families, and store them in /webagent_corpus_directory directory.

For scoring annotation, run

cd inspect_evals
inspect eval inspect_evals/webagent

Save the score annotation logs into /annotated_data_dir/annotation_raw_trajectory.json, and run

python preprocess_trajectory.py

For summary annotation, run

inspect eval inspect_evals/summary_generator

Save the summary annotation logs into /annotated_data_dir/annotation_raw_trajectory_summary.json, and run

python preprocess_trajectory_summary.py

Now construct datasets for both GRPO and SFT

cd ..
python examples/data_preprocess/prints_grpo_dataset.py --data_path /annotated_data_dir/annotated_sample_summary.json --local_dir benchmarks/PRInTS_infogain_annotation --tokenizer_path Qwen/Qwen3-4B --max_prompt_length 6144 --use_scoring --use_comparison
python examples/data_preprocess/prints_sftdataset.py --data_path /annotated_data_dir/annotated_sample_summary.json --local_dir benchmarks/PRInTS_summary_annotation --tokenizer_path Qwen/Qwen3-4B --max_prompt_length 8192

Download Models

Download our PRInTS from huggingface:

Model	Download Link
PRInTS

Training

We train PRInTS on Qwen3-4B with our alternating SFT-GRPO training schedule.

bash examples/grpo_trainer/run_qwen3-4b_PRInTS_iterative_lr1e6.sh

Evaluation

For evaluation we use the Inspect Eval evaluation pipeline and implement FRAMES, GAIA, and WebWalkerQA on top of the framework.

Bibtex

@article{lee2025prints,
      title={PRInTS: Reward Modeling for Long-Horizon Information Seeking},
      author={Jaewoo Lee and Archiki Prasad and Justin Chih-Yao Chen and Zaid Khan and Elias Stengel-Eskin and Mohit Bansal},
      year={2025},
      journal={arXiv preprint arXiv:2511.19314},
      url={https://arxiv.org/abs/2511.19314},
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
annotated_data_dir		annotated_data_dir
assets		assets
docker		docker
docs		docs
examples		examples
inspect_evals		inspect_evals
recipe		recipe
scripts		scripts
tests		tests
verl.egg-info		verl.egg-info
verl		verl
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements-npu.txt		requirements-npu.txt
requirements.txt		requirements.txt
requirements_sglang.txt		requirements_sglang.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PRInTS: Rewarding Agents for Long-Horizon Information Seeking

Overview

Install

Data annotation

Download Models

Training

Evaluation

Bibtex

About

Uh oh!

Releases

Packages

Languages

License

G-JWLee/PRInTS

Folders and files

Latest commit

History

Repository files navigation

PRInTS: Rewarding Agents for Long-Horizon Information Seeking

Overview

Install

Data annotation

Download Models

Training

Evaluation

Bibtex

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages