You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We provide the training scripts in scripts folder. For example, to perform W4A8 quantization for LLaMA-7B, run
sh scripts/llama-7b/w4a4.sh
Remember to change the path of model model and output path output_dir.
📋 Results
QLLM achieve SoTA performance in weight-activation quantization
📝 Citation
If you find our QLLM useful in your research, please consider to cite the following related papers:
@inproceedings{liu2024qllm,
title = {{QLLM}: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models},
author = {Liu, Jing and Gong, Ruihao and Wei, Xiuying and Dong, Zhiwei and Cai, Jianfei and Zhuang, Bohan},
booktitle = {International Conference on Learning Representations (ICLR)},
year = {2024},
}
🧾 License
This repository is released under the Apache 2.0 license as found in the LICENSE file.
🙏 Acknowledgement
This repository is built upon OmniQuant. We thank the authors for their open-sourced code.
About
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models"