| CARVIEW |
Select Language
HTTP/2 301
server: GitHub.com
content-type: text/html
location: https://ant-research.github.io/edicho/
x-github-request-id: 6235:2118F1:951101:A76F0C:69530802
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 23:00:18 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210049-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767049219.597844,VS0,VE196
vary: Accept-Encoding
x-fastly-request-id: 155e61bd102fe60e9ffcfb5725fc5f0292e36032
content-length: 162
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Wed, 22 Oct 2025 17:13:18 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"68f910ae-1f43"
expires: Mon, 29 Dec 2025 23:10:18 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 68E9:1387E:950C07:A76BF5:69530802
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 23:00:19 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210049-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767049219.807243,VS0,VE206
vary: Accept-Encoding
x-fastly-request-id: 2a5d5f3019cb19cce543d81ae5bb14a82fb19379
content-length: 2705
Edicho
Edicho: Consistent Image Editing in the Wild
Qingyan Bai1, 2
Hao Ouyang2
Yinghao Xu3
Qiuyu Wang2
Ceyuan Yang4 Ka Leong Cheng1, 2 Yujun Shen2 Qifeng Chen1
Ceyuan Yang4 Ka Leong Cheng1, 2 Yujun Shen2 Qifeng Chen1
1 HKUST
2 Ant Group
3 Stanford University
4 CUHK
Overview
As a verified need, consistent editing across in-the-wild images remains a technical challenge arising from various unmanageable factors, like object poses, lighting conditions, and photography environments.
Edicho steps in with a training-free solution based on diffusion models, featuring a fundamental design principle of using explicit image correspondence to direct editing.
Specifically, the key components include an attention manipulation module and a carefully refined classifier-free guidance (CFG) denoising strategy, both of which take into account the pre-estimated correspondence.
Such an inference-time algorithm enjoys a plug-and-play nature and is compatible to most diffusion-based editing methods, such as ControlNet and BrushNet.
Extensive results demonstrate the efficacy of Edicho in consistent cross-image editing under diverse settings.
Method
Motivation.
In the task of in-the-wild image editing, learning-based methods often lack proper regularization,
resulting in inconsistent edits due to the difficulty of obtaining high-quality training data and enforcing uniformity constraints.
Non-optimization methods rely on implicit correspondence from attention features for appearance transfer, but struggle with unstable predictions and intrinsic image variations,
leading to inconsistent or distorted edits. We visualize the correspondence predicted respectively by explicit and attention-based implicit methods in the figure below,
accompanied by the attention maps for correspondence prediction (regions with the highest attention weights are outlined with dashed circles).
Framework.
To achieve consistent editing, we propose a training-free and plug-and-play method which first predicts the explicit correspondence for the input
images. The pre-computed correspondence is injected into the pre-trained diffusion models and guide the denoising in the two levels of (a)
attention features and (b) noisy latents in classifier-free guidance (CFG), as in the figure below.
![]() |
![]() |
Results
Shown below are qualitative comparisons respectively on global and local editing.
With outputs from our consistent editing method (upper) and the customization techniques, customized generation (lower) could be achieved by injecting the edited concepts into the generative model.
We also adopt the neural regressor Dust3R for 3D reconstruction based on the edits by matching the 2D points in a 3D space:
![]() |
![]() |
With outputs from our consistent editing method (upper) and the customization techniques, customized generation (lower) could be achieved by injecting the edited concepts into the generative model.
![]() |
![]() |
Additional Results
Shown below are additional qualitative results of the proposed method for local (upper three) and global editing (lower three ones). The inpainted
regions for local editing are indicated with the light red color. “Fixed Seed” indicates editing results from the same random seed (the same
initial noise).
![]() |
BibTeX
@inproceedings{bai2024edicho,
title = {Edicho: Consistent Image Editing in the Wild},
author = {Bai, Qingyan and Ouyang, Hao and Xu, Yinghao and Wang, Qiuyu and Yang, Ceyuan and Cheng, Ka Leong and Shen, Yujun and Chen, Qifeng},
booktitle = {arXiv preprint arXiv:2412.21079},
year = {2024}
}






