Generative Adversarial Networks (GANs), particularly StyleGAN and its
variants, have demonstrated remarkable capabilities in generating highly
realistic images. Despite their success, adapting these models to diverse tasks
such as domain adaptation, reference-guided synthesis, and text-guided manipulation
with limited training data remains challenging. Towards this end, in this study, we
present a novel framework that significantly extends the capabilities of a pre-trained
StyleGAN by integrating CLIP space via hypernetworks. This integration allows dynamic
adaptation of StyleGAN to new domains defined by reference images or textual descriptions.
Additionally, we introduce a CLIP-guided discriminator that enhances the alignment
between generated images and target domains, ensuring superior image quality. Our
approach demonstrates unprecedented flexibility, enabling textguided image manipulation
without the need for text-specific training data and facilitating seamless style transfer.
Comprehensive qualitative and quantitative evaluations confirm the robustness and superior
performance of our framework compared to existing methods
Overview of HyperGAN-CLIP This framework employs hypernetwork modules to
adjust StyleGAN generator weights based on images or text prompts. These inputs
facilitate domain adaptation, attribute transfer, or image editing. The modulated weights
blend with original features to produce images that align with specified domains or tasks
like reference-guided synthesis and text-guided manipulation, while maintaining source integrity.
HyperGAN-CLIP and its Applications. Introducing HyperGAN-CLIP,
a flexible framework that enhances the capabilities of a pre-trained StyleGAN model
for a multitude of tasks, including multiple domain one-shot adaptation, reference-guided
image synthesis and text-guided image manipulation. Our method pushes the boundaries of
image synthesis and editing, enabling users to create diverse and high-quality images with
remarkable ease and precision.
2. Qualitative Comparisons - Domain Adaptation
Comparison against the state-of-the-art few-shot domain adaptation methods. Our
proposed HyperGAN-CLIP model outperforms competing methods in accurately capturing the visual
characteristics of the target domains.
3. Domain Mixing
Domain mixing. Our approach can fuse multiple domains to create
novel compositions. By averaging and re-scaling the CLIP embeddings of two target
domains, we can generate images that blend characteristics from both.
4. Semantic Editing in Target Domains
Semantic editing in target domains. Since latent mapper is kept
intact, our approach allows for using existing latent space discovery methods to perform
semantic edits. We manipulate two sample face images from adapted domains by playing with
age, smile, and pose using InterfaceGAN.
Comparison with state-of-the-art reference-guided image synthesis approaches.
Our approach effectively transfers the style of the target image to the source image while
effectively preserving identity compared to competing methods.
6. Reference-Guided Synthesis with Mixed Embeddings
Reference-guided image synthesis with mixed embeddings. Each row shows the input
image, the initial result with the CLIP image embedding, the refined result with a mixed embedding
that incorporates the target attribute with α=0.5, and the reference image, respectively.
Target text attributes are beard (top row), black hair (middle row), and smiling
(bottom row). Incorporating mixed modality embeddings results in more accurate and detailed image modifications.
7. Reference-Guided Synthesis on Real Images
Reference-guided image synthesis on real images. Our model can effectively
transfer the style of the target image to the source image while preserving the identity of the
source image. The results demonstrate the robustness of our model in handling real images.
Comparisons with state-of-the-art text-guided image manipulation methods. Our model
shows remarkable versality in manipulating images across a diverse range of textual descriptions. The
results vividly illustrate our model's ability to accurately apply changes based on target descriptions
encompassing both single and multiple attributes. Compared to the competing approaches, our model preserves
the identity of the input much better while successfully executing the desired manipulations.
8. Text-Guided Image Manipulation on Real Images
Text-guided image manipulation on real images. Our model can effectively manipulate
real images based on textual descriptions. The results demonstrate the robustness of our model in handling
real images and executing the desired manipulations.
BibTeX
@inproceedings{Anees2024HyperGANCLIP,
title = {HyperGAN-CLIP: A Unified Framework for Domain Adaptation, Image Synthesis and Manipulation},
author = {Abdul Basit Anees and Ahmet Canberk Baykal and Duygu Ceylan and Aykut Erdem and Erkut Erdem and Muhammed Burak Kızıl},
booktitle = {Proceedings of the ACM (SIGGRAPH Asia)},
year = {2024}
}