Language Models Represent Beliefs of Self and Others

This repository provides the code for the paper "Language Models Represent Beliefs of Self and Others". It shows that LLMs internally represent beliefs of themselves and other agents, and manipulating these representations can significantly impact their Theory of Mind reasoning capabilities.

Installation

conda create -n lm python=3.8 anaconda
conda activate lm
# Please install PyTorch (<2.4) according to your CUDA version.
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt

Then download the language models (e.g. Mistral-7B-Instruct-v0.2, deepseek-llm-7b-chat) to models/. You can also specify the file paths in lm_paths.json.

Extract Representations

sh scripts/save_reps.sh 0_forward belief
sh scripts/save_reps.sh 0_forward action
sh scripts/save_reps.sh 0_backward belief

Probing

Binary:

python probe.py --belief=protagonist --dynamic=0_forward --variable belief 
python probe.py --belief=oracle --dynamic=0_forward --variable belief
python probe.py --belief=protagonist --dynamic=0_forward --variable action 
python probe.py --belief=oracle --dynamic=0_forward --variable action
python probe.py --belief=protagonist --dynamic=0_backward --variable belief 
python probe.py --belief=oracle --dynamic=0_backward --variable belief

Multinomial:

python probe_multinomial.py --dynamic=0_forward --variable belief
python probe_multinomial.py --dynamic=0_forward --variable action
python probe_multinomial.py --dynamic=0_backward --variable belief

BigToM Evaluation

sh scripts/0_forward_belief.sh
sh scripts/0_forward_action.sh
sh scripts/0_backward_belief.sh

Intervention

Intervention for the Forward Belief task:

sh scripts/0_forward_belief_interv_oracle.sh
sh scripts/0_forward_belief_interv_protagonist.sh
sh scripts/0_forward_belief_interv_o0p1.sh

Cross-task intervention:

sh scripts/cross_0_forward_belief_to_forward_action_interv_o0p1.sh
sh scripts/cross_0_forward_belief_to_backward_belief_interv_o0p1.sh

Citation

@inproceedings{zhu2024language,
    title={Language Models Represent Beliefs of Self and Others},
    author={Zhu, Wentao and Zhang, Zhining and Wang, Yizhou},
    booktitle={Forty-first International Conference on Machine Learning},
    year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
LM_hf.py		LM_hf.py
README.md		README.md
evaluate_conditions.py		evaluate_conditions.py
evaluate_llm.py		evaluate_llm.py
instructions.txt		instructions.txt
lm_paths.json		lm_paths.json
probe.py		probe.py
probe_multinomial.py		probe_multinomial.py
requirements.txt		requirements.txt
save_reps.py		save_reps.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Language Models Represent Beliefs of Self and Others

Installation

Extract Representations

Probing

BigToM Evaluation

Intervention

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Walter0807/RepBelief

Folders and files

Latest commit

History

Repository files navigation

Language Models Represent Beliefs of Self and Others

Installation

Extract Representations

Probing

BigToM Evaluation

Intervention

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages