You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As SOTA network architectures continue to evolve, we need to ask if techniques developed for one type of architecture (transformer LMs) can also be applied to another (Mamba), and how much of the insights transfer across architectures.
In the context of factual recall, we investigate
If facts can be localized to particular modules and token positions in Mamba by adapting activation-patching / causal tracing to Mamba.
We check if we can edit facts with ROME in Mamba by directly editing one of the projection matrices in MambaBlock architecture.
We further investigate to the extent to which insights from works such as Hernandez et al. (2023) and Geva et al. (2023) generalize to Mamba. We identify that it is hard to implement techniques similar to attention knockout (used in Geva et al. (2023)) in Mamba due to certain architectural choices.
If you find this work useful, please consider citing:
@article{sensharma2024locating,
title={Locating and Editing Factual Associations in Mamba},
author={Arnab Sen Sharma and David Atkinson and David Bau},
year={2024},
eprint={2404.03646},
archivePrefix={arXiv},
primaryClass={cs.CL}
}