Bootstrapping with DPO Implicit Rewards (DICE)

This repository contains the implementation of our paper Bootstrapping Language Models via DPO Implicit Rewards. We show that the implicit reward model from the prior DPO training can be utilized to bootstrap and further align LLMs.

Quick links

Bootstrapping with DPO Implicit Rewards (DICE)

Base Models and Released Models

Model	AE2 LC	AE2 WR
🤗Llama-3-Base-8B-SFT-DPO	18.20	15.50
🤗Llama-3-Base-8B-DICE Iter1	25.08	25.77
🤗Llama-3-Base-8B-DICE Iter2	27.55	30.99
🤗Zephyr-7b-beta	12.69	10.71
🤗Zephyr-7B-DICE Iter1	19.03	17.67
🤗Zephyr-7B-DICE Iter2	20.71	20.16

Please refer to pipeline.sh#1.1_response_generation on instructions for batch inference with the appropriate chat template.

Setup

Install dependencies

Please install dependencies using the following command:

git clone https://github.com/sail-sg/dice.git
conda create -n dice python=3.10
conda activate dice
cd dice/llama-factory
pip install -e .[deepspeed,metrics,bitsandbytes]
cd ..
pip install -e .
pip install -r requirements.txt
# optional to install flash attention
pip install flash-attn --no-build-isolation

Setup the bash script

Provide the local path to this repo to DICE_DIR in two files:

scripts/run_dice/iter.sh
scripts/run_dice/pipeline.sh

E.g. DICE_DIR="/home/username/dice"

Training scripts

We provide sample training scripts for both Llama3 and Zephyr settings. It is recommended to run the script with 8x A100 GPUs. For other hardware environments, you might need to adjust the script.

Llama3
```
bash scripts/run_dice/iter.sh llama3
```
Zephyr
```
bash scripts/run_dice/iter.sh zephyr
```

Acknowledgement

This repo is built on LLaMA-Factory. Thanks for the amazing work!

Citation

Please consider citing our paper if you find the repo helpful in your work:

@inproceedings{chen2025bootstrapping,
   title={Bootstrapping Language Models with DPO Implicit Rewards},
   author={Chen, Changyu and Liu, Zichen and Du, Chao and Pang, Tianyu and Liu, Qian and Sinha, Arunesh and Varakantham, Pradeep and Lin, Min},
   booktitle={International Conference on Learning Representations (ICLR)},
   year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data/UF-10k-subset		data/UF-10k-subset
llama-factory		llama-factory
scripts		scripts
self_alignment		self_alignment
.gitignore		.gitignore
DICE.png		DICE.png
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bootstrapping with DPO Implicit Rewards (DICE)

Quick links

Base Models and Released Models

Setup

Install dependencies

Setup the bash script

Training scripts

Acknowledgement

Citation

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

sail-sg/dice

Folders and files

Latest commit

History

Repository files navigation

Bootstrapping with DPO Implicit Rewards (DICE)

Quick links

Base Models and Released Models

Setup

Install dependencies

Setup the bash script

Training scripts

Acknowledgement

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages