You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1PRLab, Nanjing University ·
2The University of Hong Kong ·
3UCAS ·
4Lovart AI ·
5WeChat, Tencent Inc.
* Equal Contribution † Corresponding Author
🎥 Video Demo
video_demo.mp4
🚀 Release
Paper released
Demo video released
Code coming soon
🔥 The Reason for Bitwise AR Model Degrades Diversity
In bitwise autoregressive models, the predicted probabilities often become overly peaked, giving one class near-certain confidence.
This causes top-p sampling to collapse into a deterministic choice, removing randomness.
As shown in the two subplots below, this collapse of bit-level randomness restricts feature variation, and diversity degradation mainly arises from (1) the binary classification nature of bitwise modeling and (2) the overconfident output distributions.
※ The diversity issue mainly originates from bitwise tokenization, rather than from VAR (Visual Autoregressive Modeling) itself.
🧩 The DiverseAR Framework
DiverseAR is an effective approach that enhances image and video diversity in bitwise autoregressive modeling without sacrificing visual quality.
It introduces adaptive logits smoothing and an energy-based generation path selection strategy to achieve richer, more diverse sampling.
📚 Citation
If you find this work helpful, please consider citing:
@article{yang2025diversear,
title={DiverseAR: Boosting Diversity in Bitwise Autoregressive Image Generation},
author={Yang, Ying and Lv, Zhengyao and Pan, Tianlin and Wang, Haofan and Yang, Binxin and Yin, Hubery and Li, Chen and Si, Chenyang},
journal={arXiv preprint arXiv:2512.02931},
year={2025}
}
About
DiverseAR: Boosting Diversity in Bitwise Autoregressive Image Generation