| CARVIEW |
Inserting Anybody in Diffusion Models via Celeb Basis
Method
The text embedding space has some nice feature of Interpolation, which inspired us to define a space for human generation.
First, we collect about 1,500 celebrity names as the initial collection. Then, we manually filter the initial one to $m=691$ names, based on the synthesis quality of text-to-image diffusion model(stable-diffusion} with corresponding name prompt. Later, each filtered name is tokenized and encoded into a celeb embedding group $g_i$. Finally, we conduct Principle Component Analysis to build a compact orthogonal basis.
During training~(left), we optimize the coefficients of the celeb basis with the help of a fixed face encoder. During inference~(right), we combine the learned personalized weights and shared celeb basis to generate images with the input identity.
Comparisons on the StyleGAN Synthetic Faces as training sample.
Single Person's Comparisons on Real Identities as training sample.
Multiple Persons' Personalization on Real Identities
More Evaluation
Two persons interaction.
Personalization for single person.
Personalization for single person.
Expression controlling.
BibTeX
@article{yuan2023celebbasis,
title={Inserting Anybody in Diffusion Models via Celeb Basis},
author={Yuan, Ge and Cun, Xiaodong and Zhang, Yong and Li, Maomao and Qi, Chenyang and Wang, Xintao and Shan, Ying and Zheng, Huicheng},
journal={arXiv preprint arXiv:2306.00926},
year={2023}
}