REMEDI: Editing Knowledge in Language Model Representations

Inspecting and Editing Knowledge Representations in Language Models
Evan Hernandez, Belinda Z. Li, Jacob Andreas.

This repository provides an implementation of the Representation Mediation (REMEDI) method for autoregressive transformer language models.

Setup

All code is tested on MacOS Ventura (>= 13.1) and Ubuntu 20.04 using Python >= 3.10. It uses a lot of newer Python features, so the Python version is a strict requirement.

To run the code, create a virtual environment with the tool of your choice, e.g. conda:

conda create --name remedi python=3.10

Then, after entering the environment, install the project dependencies:

python -m pip install invoke
invoke install

Data

We cannot re-release the datasets used in the paper. However, you can download the raw datasets yourself and point our code to it:

CounterFact: Available on the ROME website. Note our code will automatically download this specific dataset for you.
Bias in Bios: Must be downloaded using the official code release. When running a REMEDI script, set --dataset-file <pkl file> to point to the resulting pickle file.
McRae Norms: Download the supplemental material of this paper and set --dataset-file <path to download>/CONCS_FEATS_concstats_brm.txt

Replicating Experiments

All experiments from the paper can be run through invoke. To see the full list, run:

invoke --list

Any task prefixed with an x. corresponds to an experiment. The invoke scripts have the hyperparameters from the paper baked into them. Most experiments support two flags: --device to specify the GPU, and --model to specify which LM to use (default: GPT-J).

Training

The code supports training editors for most GPT variants: GPT2*, GPT-J, and GPT-NeoX (though Neo-X is too big with gradients for most single GPUs). In theory, the code also supports any autoregressive transformer LM, but this may need to slightly modify parts of determine_hidden_size and determine_layers inside the models module.

To run training with the default configuration, use invoke, e.g.:

invoke x.train.counterfact --device cuda

For more fine-grained control over the hyperparameters, run the training script directly, e.g.:

python -m scripts.train_editors \
    -n my_custom_editors \
    -m gptj \
    -d counterfact \
    -l 0 1 2 \
    --lam-kl 100 \
    --device cuda

The help strings for each command contain most of what you need to know.

Evaluating

After training editors, you can evaluate them on any of the benchmarks considered in the paper. If you trained them via invoke, this is as simple as running another invoke command, typically one prefixed with x.eval e.g.:

invoke x.eval.gen.counterfact --device cuda

...which evaluate REMEDI on generation quality in counterfact.

Alterantively, as before, you can call the evaluation scripts directly.

python -m scripts.eval_fact_gen \
    -n my_custom_eval \
    -e results/my_custom_editors \
    -m gptj \
    -l 1 \
    --device cuda

Contributing

While this library is not designed for industrial use (it's just a research project), we do believe research code should support reproducibility. If you have issues running our code in the supported environment, please open an issue on this repository.

If you find ways to improve our code, you may also submit a pull request. Before doing so, please ensure that the code type checks, lints cleanly, and passes all unit tests. The following command should exit cleanly:

invoke presubmit

How to Cite

@InProceedings{hernandez2023remedi,
  title     =   {Inspecting and Editing Knowledge Representations in Language Models},
  author    =   {Hernandez, Evan and Li, Belinda Z. and Andreas, Jacob},
  booktitle =   {Arxiv},
  year      =   {2023},
  url       =   {https://arxiv.org/abs/2304.00740}
}

Name		Name	Last commit message	Last commit date
Latest commit History 810 Commits
.github/workflows		.github/workflows
experiments		experiments
notebooks		notebooks
remedi		remedi
scripts		scripts
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
invoke.yaml		invoke.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
tasks.py		tasks.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

REMEDI: Editing Knowledge in Language Model Representations

Setup

Data

Replicating Experiments

Training

Evaluating

Contributing

How to Cite

About

Uh oh!

Releases

Uh oh!

Contributors 2

Uh oh!

Languages

License

evandez/REMEDI

Folders and files

Latest commit

History

Repository files navigation

REMEDI: Editing Knowledge in Language Model Representations

Setup

Data

Replicating Experiments

Training

Evaluating

Contributing

How to Cite

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors 2

Uh oh!

Languages