You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository contains code for Yet Another Quantization Algorithm (YAQA), a quantization framework that uses a Kronecker-factored approximation of the layerwise Hessian with respect to the full-model KL divergence to better preserve model outputs after quantization.
YAQA reduces the KL divergence to the original model by a factor of 1/3 over LDLQ/GPTQ across a wide range of models and quantizers, translating to state of the art performance on downstream tasks.
For more details, see the paper.
How to use this codebase
This codebase is based off of the QTIP codebase, with modifications made to support YAQA's quantization algorithm.
To collect Hessians, see the README in hessian_llama/.
To quantize models, follow the instructions in the QTIP codebase.
Prequantized models and Sketch-B Hessians (see paper) can be found here.
Other
If you found this work useful, please consider citing
@misc{tseng2025modelpreservingadaptiverounding,
title={Model-Preserving Adaptive Rounding},
author={Albert Tseng and Zhaofeng Sun and Christopher De Sa},
year={2025},
eprint={2505.22988},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2505.22988},
}
Use of Llama models is governed by the Llama Community License. Use of this code is governed by the GNU GPL v3 license.