You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We study how fine-tuning affects the internal mechanisms implemented in language models. As a case study, we explore the task of entity tracking in Llama-7B, and in its fine-tuned variants - Vicuna-7B, Goat-7B, Float-7B.
Our findings suggest that fine-tuning enhances, rather than fundamentally alters, the mechanistic operation of the model.
Moreover, in order to uncover the reason behind the performance enhancement in fine-tuned models employing the same mechanism, we have introduced a novel approach called CMAP (Cross-Model Activation Patching). This method involves patching activations across models to elucidate the enhanced mechanisms. The notebook experiment_3/cmap.ipynb provides a demonstration on how to execute the complete experiment.
Note: You need to have the weights for the LLaMA-7b model which is under a non-commercial license. Use this form to request access to the model, if you do not have it already.
@inproceedings{prakash2023fine,
title={Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking},
author={Prakash, Nikhil and Shaham, Tamar Rott and Haklay, Tal and Belinkov, Yonatan and Bau, David},
booktitle={Proceedings of the 2024 International Conference on Learning Representations},
note={arXiv:2402.14811},
year={2024}
}
About
This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking".