You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability
🚀 Introduction
We propose MaskSearch, a novel pre-training framework to further enhance the universal search capability of agents.
We introduce the Retrieval Augmented Mask Prediction (RAMP) task, where the model learns to leverage search tools to fill masked spans on a large number of pre-training data, thus acquiring universal retrieval and reasoning capabilities for LLMs.
We combine agent-based and distillation-based methods to generate training data, starting with a multi-agent system consisting of a planner, rewriter, observer, and followed by a self-evolving teacher model.
Extensive experiments demonstrate that MaskSearch significantly enhances the performance of LLM-based search agents on both in-domain and out-of-domain downstream tasks.
💡 Performance
🛠 Running MaskSearch
Before running, please replace the placeholder with your own Qwen key and Google_search key in src/RAMP/model.py, src/multi_agent/model.py and src/multi_agent/web_news_get.py.
After generating the data, the third step is to use the data for training. For SFT, you can refer to the training process of LLaMA-Factory; for RL, you can refer to Search-R1 and ZeroSearch.
🙏 Acknowledgements
This work is implemented based on ChineseWiki, LLaMA-Factory, Search-R1, and verl. We greatly appreciate their valuable contributions to the community.
📝 Citation
@article{wu2025masksearchuniversalpretrainingframework,
title={MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability},
author={Weiqi Wu and Xin Guan and Shen Huang and Yong Jiang and Pengjun Xie and Fei Huang and Jiuxin Cao and Hai Zhao and Jingren Zhou},
year={2025},
eprint={2505.20285},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.20285},
}
About
Repo for "MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability"