This is the repository for Mitigating Bias in RAG: Controlling the Embedder! This repository includes the GenderBias-QA and PoliticBias-QA datasets, as well as the training and inference code to reproduce our results. The goal of this repo is to bias the embedder for these datasets to achieve an unbiased RAG system.
- Getting Started
- Sorting Documents
- Fine-tuning an Embedder
- Evaluation and Inferencing
- Evaluating Projections
This repository includes:
- Datasets: GenderBias-QA and PoliticBias-QA
- Training Script: Fine-tune an embedder with PEFT or WiSE-FT to adjust its bias
- Evaluation Script: Evaluate the performance of the biased embedder within a full RAG pipeline
- Projection Script: Assess projections in the embedding space for biasing retrieved documents
First, install the Python packages by running:
conda env create --file environment.ymlSecond, enter your API keys and cache directory in env_config.sh and run:
source env_config.shTo prepare for contrastive fine-tuning, you should first sort positive and negative documents from a corpus. To do this, run:
bash run-bounds.shThis command runs src/rag-get-bound.py which will save the indices of the positive and negative documents in dataset/filtered_bias_index. The saved indices will later be used during fine-tuning.
Here are some of the main arguments to be passed into rag-get-bound.py:
-query_d: Chooses the bias dataset (eitherPoliticBias-QAorGenderBias-QA).-corpus_d: Specifies the corpus in which the documents get sorted from. Possible choices include:mteb/msmarcomteb/dbpediamteb/fevermteb/nqmteb/hotpotqawebis/args_mewebis-argument-framingwebis/conclugenPol_NLI
After you have sorted your documents for training, you can fine-tune an embedder to change its bias by running:
bash run-train.shThis command runs embedder/train.py to train the embedder and evaluate both its accuracy and bias. The evaluation results are saved to a JSON file. By default, the trained model is not saved unless you provide the --save_dir argument.
Here are some of the main arguments to be passed into train.py:
--bias_dataset:GenderBias-QAorPoliticBias-QA.--lr: Learning rates to use (e.g., [3e-5, 1e-5]).--epochs: Number of epochs (e.g., [5, 10, 15]). Setting this to 0 will evaluate the model without training.--freeze_layers: Number of layers to freeze (e.g., [0, 1, 2, 3, 4]).--merge_weight: Weights for merging (e.g., [0.1, 0.3, 0.5, 0.7, 0.9]).--train_corpus: Specify training corpus parts by using a hyphen to merge names (e.g.,nq-msmarco-conclugen). Allowed corpus names include: msmarco, fever, dbpedia, nq, hotpotqa, conclugen, argument, args_me.--save_dir: Optionally provide a directory to save the trained model; if omitted, only the accuracy and bias evaluations will be saved to a JSON file.
The fine-tuning code is adapted from SGPT.
To automatically print the command for fine-tuning multiple embedders with varying configurations (e.g., learning rate, epochs, numbers of frozen layers), modify experiments/make_commands.py and run:
cd experiments
python make_commands.pyThis will create a text file with each line being a command to run one fine-tuning job.
After an embedder has been trained, you may test the performance of the embedder within a full rag pipeline. This will evaluate the bias of the embedder and RAG pipeline. To evaluate, run:
bash run-inference.shThis command runs src/rag-inference.py which will evaluate the bias of a full RAG pipeline. You will need to provide the LLM, corpus, and embedder for evaluation.
Here are some of the main arguments to be passed into rag-inference.py:
-query_d: Specifies the queries to retrieve documents for. Can be chosen betweenGenderBias-QAorPoliticBias-QA.-corpus_d: Specifies the corpus for inference. Can be chosen betweenmteb/nq,mteb/hotpotqa, orPol_NLI.-llm: Specifies the Huggingface LLM to inference on-rm: Specifies the embedder to retrieve documents with. Select a base embedder from Huggingface or a directory to a trained embedder.
You can also bias embedders by projecting them in the embedding space instead of fine-tuning. To do so, run:
bash run-projection.shThis command runs src/rag-projection.py which will evaluate the bias of the full RAG pipeline using projections in a base embedder (gte-base by default).
Here are some of the main arguments to be passed into rag-projection.py:
-query_d: Specifies the queries to retrieve documents for. Can be chosen betweenGenderBias-QAorPoliticBias-QA.-corpus_d: Specifies the corpus for inference. Can be chosen betweenmteb/nq,mteb/hotpotqa, orPol_NLI.-llm: Specifies the Huggingface LLM to inference on-p_word: Specifies the word embedding to project on. Should befemaleforGenderBias-QAandrepublicanforPoliticBias-QA
If you find this work useful, please cite our paper:
@misc{kim2025mitigatingbiasragcontrolling,
title={Mitigating Bias in RAG: Controlling the Embedder},
author={Taeyoun Kim and Jacob Springer and Aditi Raghunathan and Maarten Sap},
year={2025},
eprint={2502.17390},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.17390},
}