You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Language Models Represent Beliefs of Self and Others
This repository provides the code for the paper "Language Models Represent Beliefs of Self and Others". It shows that LLMs internally represent beliefs of themselves and other agents, and manipulating these representations can significantly impact their Theory of Mind reasoning capabilities.
Installation
conda create -n lm python=3.8 anaconda
conda activate lm
# Please install PyTorch (<2.4) according to your CUDA version.
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt
sh scripts/0_forward_belief.sh
sh scripts/0_forward_action.sh
sh scripts/0_backward_belief.sh
Intervention
Intervention for the Forward Belief task:
sh scripts/0_forward_belief_interv_oracle.sh
sh scripts/0_forward_belief_interv_protagonist.sh
sh scripts/0_forward_belief_interv_o0p1.sh
Cross-task intervention:
sh scripts/cross_0_forward_belief_to_forward_action_interv_o0p1.sh
sh scripts/cross_0_forward_belief_to_backward_belief_interv_o0p1.sh
Citation
@inproceedings{zhu2024language,
title={Language Models Represent Beliefs of Self and Others},
author={Zhu, Wentao and Zhang, Zhining and Wang, Yizhou},
booktitle={Forty-first International Conference on Machine Learning},
year={2024}
}
About
[ICML 2024] Language Models Represent Beliefs of Self and Others