| CARVIEW |
Select Language
HTTP/2 301
server: GitHub.com
content-type: text/html
location: https://haoosz.github.io/BiGR/
x-github-request-id: 19D8:21D6A4:8A3B6D:9B1DFA:695242E0
accept-ranges: bytes
date: Mon, 29 Dec 2025 08:59:12 GMT
via: 1.1 varnish
age: 0
x-served-by: cache-bom-vanm7210075-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766998752.193899,VS0,VE202
vary: Accept-Encoding
x-fastly-request-id: 03409254675babede3b4e119a7650262900f192e
content-length: 162
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Sun, 26 Jan 2025 06:35:53 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"6795d7c9-38ef"
expires: Mon, 29 Dec 2025 09:09:12 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: FE9D:3655F2:89A674:9A88A4:695242DC
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 08:59:12 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210075-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766998752.425334,VS0,VE217
vary: Accept-Encoding
x-fastly-request-id: 6b959b1bf8c316d7c5fd3acf56c26bfe538b7de7
content-length: 3565
BiGR
BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities
Shaozhe Hao1,
Xuantong Liu2,
Xianbiao Qi3*,
Shihao Zhao1,
Bojia Zi4,
Rong Xiao3, Kai Han1†, Kwan-Yee K. Wong1†
Rong Xiao3, Kai Han1†, Kwan-Yee K. Wong1†
1The University of Hong Kong
2Hong Kong University of Science and Technology
3Intellifusion 4The Chinese University of Hong Kong
3Intellifusion 4The Chinese University of Hong Kong
*Project lead †Corresponding authors
ICLR 2025
Advantages of BiGR
- Uniformity: BiGR is the first conditional image generation model that unifies generative and discriminative tasks within the same model. By modeling compact binary latent codes, BiGR delivers strong performance in both tasks compared to existing models.
- Efficiency: BiGR generates images at a low time cost, attributed to the small number of sampling steps required in the iterative unmasking process, while still maintaining high generation quality.
- Flexibility: BiGR can be flexibly employed for various vision applications, such as inpainting, outpainting, editing, interpolation, and enrichment in a zero-shot manner, without the need for task-specific structural changes or parameter fine-tuning.
- Scalability: BiGR demonstrates scalability in both generative and discriminative tasks, as evidenced by comprehensive evaluations of generation quality and linear-probe performance.
Method
BiGR is built upon Llama backbone, incorporating mask-token prediction and binary transcoder. BiGR is trained with a weighted binary cross-entropy (wBCE) loss for reconstructing masked tokens. For image generation, we design entropy-order sampling. For visual representation, we simply apply average pooling in the intermediate layers.
Results
Quantitative Comparison
Image Generation
Zero-shot Generalized Applications
BiGR supports diverse zero-shot applications, without requiring task-specific structural changes or parameter fine-tuning.
Try out BiGR yourself at Colab!
BibTeX
If you find this project useful for your research, please cite the following:
@misc{hao2024bigr,
title={Bi{GR}: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities},
author={Shaozhe Hao and Xuantong Liu and Xianbiao Qi and Shihao Zhao and Bojia Zi and Rong Xiao and Kai Han and Kwan-Yee~K. Wong},
year={2024},
}