| CARVIEW |
ReF-LDM: A Latent Diffusion Model for Reference-based Face Image Restoration
NeurIPS 2024
🔥Key highlights include:
- CacheKV: efficiently incorporates a flexible number of reference images into an LDM.
- Timestep-scaled identity loss: addresses diffusion training 1-step prediction OOD issue.
- FFHQ-Ref dataset: 20,405 face images with corresponding reference images.
Model Pipeline
We first extract and cache the key and value tokens of the reference images just once. These cached KV tokens can then be utlized repeatedly in each of the T timesteps of the main denoising process.
CacheKV Ablation
We also experimented with other designs for integrating the reference images. However, they are either ineffective (channel-concatenation, cross-attention) or inefficient (spatial-concatenation).
*Note: tested with 5 reference images on a single GTX 1080.
Timestep-scaled Identity Loss
We found that training with identity loss degrades the image quality of the diffusion model. The reason may be that the one-step model prediction from a large timestep is out-of-distrubution for the pretrained ArcFace model. Our solution is simple and effective: downscaling the identity loss when a larger timestep is sampled for training.
Qualitative Results
Our ReF-LDM achieves great identity similarity by successfully leveraging the reference images.
BibTeX
@inproceedings{hsiao2024refldm,
title={ReF-LDM: A Latent Diffusion Model for Reference-based Face Image Restoration},
author={Chi-Wei Hsiao and Yu-Lun Liu and Cheng-Kun Yang and Sheng-Po Kuo and Yucheun Kevin Jou and Chia-Ping Chen},
journal={Advances in Neural Information Processing Systems},
year={2024}
}