[WACV'25] CorrFill: Enhancing Faithfulness in Reference-based Inpainting with Correspondence Guidance in Diffusion Models
The Official PyTorch implementation of CorrFill: Enhancing Faithfulness in Reference-based Inpainting with Correspondence Guidance in Diffusion Models (WACV'25).
Kuan-Hung Liu1,
Cheng-Kun Yang*2,
Min-Hung Chen3,
Yu-Lun Liu1,
Yen-Yu Lin1
1National Yang Ming Chiao Tung University, 2National Taiwan University, 3NVIDIA
(*Now at MediaTek Inc., Taiwan.)
[Paper] [Website] [BibTeX] [WACV'25 Poster]
This work introduces CorrFill, a training-free module designed to enhance the faithfulness of reference-based image inpainting in diffusion models. CorrFill guides the inpainting process with correspondence between the reference and target images, estimated during the inpainting process, by constraining the inpainting process of diffusion models through self-attention masking and input latent optimization. We conduct experiments on RealEstate10K and MegaDepth with four different baseline diffusion models, which demonstrate higher faithfulness in both quantitative and qualitative results.
For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing.
This code base requires Python 3.10.
git clone git@github.com:khliu0000/corr_inpainting.git
cd corr_inpainting
conda env create --file environment.yml python=3.10
Most of the required models will be automatically downloaded through huggingface when first executed.
Reproducing IP-Adapter-Plus with CorrFill requires runwayml's Stable Diffusion, which is currently unavailable. However, you can use other inpainting models that are compatible with IP-Adapter-Plus.
The image encoder of IP-Adapter-Plus and the weights for the adapter are required to reproduce results for IP-Adapter-Plus with CorrFill.
Please download models/image_encoder/* and models/ip-adapter-plus_sd15.bin from the huggingface repository, and place it as this structure:
corr_inpainting # repo root
├── IP-Adapter
│ └── models
│ ├── image_encoder
│ └── ip-adapter-plus_sd15.bin
...
Due to copyright restrictions, we cannot provide our testing data. However, we offer examples of how we generate image pairs and random masks. Please refer to ./data_process/.
- To reproduce the experiments, it is recommended to execute with at least 24GB of memory.
- Disable latent input optimization will reduce the amount of memory required, but it will also lower the performance.
- To disable latent optimization, set
latent_optimizein the configure files tofalsein the config files.
- To disable latent optimization, set
- The model is in
float16mode by default. - Results may vary slightly with each execution. Please refer to PyTorch's documentation on reproducibility.
The results will be stored in ./results/. Please follow the below commands for reproduction.
- RealEstate10K
python corr_guided.py --config config/leftrefill_realestate.json- MegaDepth
python corr_guided.py --config config/leftrefill_megadepth.json- RealEstate10K
python corr_guided.py --config config/sidebyside_realestate.json- MegaDepth
python corr_guided.py --config config/sidebyside_megadepth.json- RealEstate10K
python corr_guided.py --config config/ipadapter_realestate.json- MegaDepth
python corr_guided.py --config config/ipadapter_megadepth.jsonThe directory ipa_embed will be created to cache the image embedding for saving memory.
- RealEstate10K
python corr_guided.py --config config/paintbyexample_realestate.json- MegaDepth
python corr_guided.py --config config/paintbyexample_megadepth.jsonThe directory pbe_embed will be created to cache the image embedding for saving memory.
corr_guided.py: Script for repreduce experiments.corrfill_module: Implementation of our main method.corrfill_module/corrfill.py: Code for CorrFill inpainting method, including attention processor swapping and attention maps collections.corrfill_module/cg_pipeline.py: Code for customized diffusers pipeline of CorrFill.corrfill_module/pbe_pipeline.py: Code for customized diffusers pipeline of CorrFill, for the experiments of PaintByExample.corrfill_module/pbe_image_encoder.py: Code for image encoder used by PaintByExample from diffusers.corrfill_module/utils.py: Code loss calculation for both pipelines.corrfill_module/share.py: Shared objects used globally.
data_process: Exemplar scripts for dataset processing for reference.ip_adapter: IP-Adapter modules for experiments of IP-Adapter Plus from IP-Adapter.
- The implementation is based on huggingface diffusers.
- The implementation of latent optimization is inspired by Diffusion Self-Guidance for Controllable Image Generation and HD-Painter.
- The IP-Adapter Plus module is adapted from tencent-ailab.
- The PaintByExapmle module is implemented by diffusers.
- The implementation of LeftRefill is adapted from ewrfcas.
- The example script for
RealEstate10Kcollection is adapted from RealEstate10K downloader. - The example script for mask generation is adapted from Tangshitao's implementation of QuadTreeAttention for LoFTR.
- The learned prompt embeds of LeftRefill are extracted from the release weights of LeftRefill.
RealEstate10Kdataset is extracted Youtube frames annotated by Google.MegaDepthdataset is collected from MegaDepth project.
If you find CorrFill useful, please consider giving a star and citation:
@inproceedings{liu2025corrfill,
title={CorrFill: Enhancing Faithfulness in Reference-based Inpainting with Correspondence Guidance in Diffusion Models},
author={Liu, Kuan-Hung and Yang, Cheng-Kun and Chen, Min-Hung and Liu, Yu-Lun and Lin, Yen-Yu},
booktitle={WACV},
year={2025}
}Copyright © 2024, NVIDIA Corporation. All rights reserved.
This work is made available under the NVIDIA Source Code License-NC. Click here to view a copy of this license.

