You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Official implementation of stochastic error ascent (SEA) proposed by our paper: Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base
If you feel our work is helpful, feel free to ⭐ our project!
How does SEA work?
SEA includes six steps:
Retrieve top-k error-related paragraphs from the knowledge base (see utils/sample.py:page_related_sampling)
Random sample from the retrieved batch
Evaluate the Testee LLM on the batch (see utils/qa_generator.py)
Construct a Relation DAG (see utils/qa_generator.py)
Prune the source error (see utils/graph.py)
How to use?
Preparation
Install Conda and create the conda environment (sea) by
conda create -f environment.yml
A Linux system is recommended.
Download the preprocessed Wikipedia data through this link. You can check the data format in examples/processed_document_example.json.
Specify the key file (create a [See utils/devapi.py].key file) for the closed-source model. You should check the MODEL_MAP in utils/devapi.py for details.
Usage
All the following scripts will log the LLM's per-step answer in res_same_topic[_abla] and res_same_seed. You can check the output format in examples/qa_example.json
Run SEA
You can run SEA with the following command:
conda activate sea
python sea.py
In sea.py, you need to specify the data path (DATA_PATH) to the preprocessed Wikipedia data. It includes all preprocessed pages (id_name_abs_url_para) and a faiss index file (faiss_index_gte_para.ivf).
You can also adjust the total budget (TOTAL_BUDGET), per-step QA size (QA_SIZE), rephrase times (REPHRASE), top k value (TOP_K_PAGE), embedding model (EMBEDDING_MODEL), error and pruning threshold (ERROR_THRES and PRUNING_THRES), pruning method (cumacc and avgacc, where avgacc is an ablation of our method), QA generator model (default is gpt-4o), and seed (default is 42).
You can modify the INIT_TOPICS to specify the search topic for the initial set. Try your topic!
Result for ablation studies (Figure 3) in our paper:
How can we analyze SEA's results (Figures 5 and 6)?
We leave a post-process code in process_exist.py, where you can specify the result you want to format. After that, you can run the code in analysis.ipynb for all the ablation and analysis result visualization.
How to customize your knowledge base?
It is quite easy to customize your dataset! The link in Preparation—step 3 includes the data processing code. For this processing, you will use emb_processing.py, indexing.py, and sep_file.py (remember to modify the path).
(Step 1) Process your document into the following content (see examples/processed_document_example.json). Specifically, you should have:
A name (general title, like the general title of a Wikipedia page)
An abstract
A wikitext (in markdown format). Note that you should split each section by a title starting from # or ## mark.
(Step 2) After that, summarize all your documents into a json file (e.g., document.json) and run the emb_processing.py and indexing.py. An important note: Make sure the embedding model used to process the dataset is the same as that used in SEA.
(Step 3) Separate your document.json into multiple sub-files using sep_file.py.
(Step 4) Specify the path to your separated files and index file to SEA and start the discovery journey!
@misc{song2025discoveringknowledgedeficiencieslanguage,
title={Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base},
author={Linxin Song and Xuwei Ding and Jieyu Zhang and Taiwei Shi and Ryotaro Shimizu and Rahul Gupta and Yang Liu and Jian Kang and Jieyu Zhao},
year={2025},
eprint={2503.23361},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.23361},
}