You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
WeGen is a unified framework that integrates multimodal understanding and generation, enabling users to achieve various visual generation goals through natural conversation. It excels at generating diverse results with high creativity for less detailed instructions and can progressively refine prior generation results while maintaining consistency with user references.
Key Features
Unified Framework: Seamlessly integrates diverse capabilities including text-to-image generation, subject-driven generation, condition-driven generation, image restoration, and style transfer
Dynamic Instance Identity Consistency (DIIC): Maintains instance identity consistency while allowing natural variations in generated contents
Demo
coming soon.
Installation
Clone the repository:
git clone https://github.com/hzphzp/WeGen.git
cd WeGen/
Prepare the base enviroment, we use ubuntu20, python3.8, with H20 or 910B GPUs
Install required packages:
bash env.sh
Download the pre-trained models from here and construct the pretrained model folder like:
run the following command to evaluate the model on 8 H20/910B GPUs Node:
bash scripts/inference.sh
Citing
If you find this code and work useful, please consider citing the following paper and star this repo. Thank you very much!
@article{huang2025wegen,
title={WeGen: A Unified Model for Interactive Multimodal Generation as We Chat},
author={Huang, Zhipeng and Zhuang, Shaobin and Fu, Canmiao and Yang, Binxin and Zhang, Ying and Sun, Chong and Zhang, Zhizheng and Wang, Yali and Li, Chen and Zha, Zheng-Jun},
journal={arXiv preprint arXiv:2503.01115},
year={2025}
}