Zixin Yin1, Ling-Hao Chen2,3, Lionel Ni1,4, Xili Dai4
1HKUST, 2Tsinghua University, 3IDEA Research, 4HKUST(GZ)
β¨ACM SIGGRAPH Asia 2025β¨
| Source Video | Edited Video |
|---|---|
![]() |
![]() |
pip install -r requirements.txtDownload the required diffusion models:
- Stable Diffusion 3:
/path/to/stable-diffusion-3-medium-diffusers - FLUX.1-dev:
/path/to/FLUX.1-dev - CogVideoX-2b:
/path/to/CogVideoX-2b
Update the model paths in the scripts accordingly.
We provide two demonstration scripts in the script/ directory:
bash script/sd3_consist_edit.shbash script/sd3_inconsist_edit.shpython run_synthesis_sd3.py \
--src_prompt "a portrait of a woman in a red dress in a forest, best quality" \
--tgt_prompt "a portrait of a woman in a yellow dress in a forest, best quality" \
--edit_object "dress" \
--out_dir "output" \
--alpha 1.0 \
--model_path "/path/to/stable-diffusion-3-medium-diffusers"python run_synthesis_sd3.py \
--src_prompt "a portrait of a woman in a red dress, realistic style, best quality" \
--tgt_prompt "a portrait of a woman in a yellow dress, cartoon style, best quality" \
--edit_object "dress" \
--out_dir "output" \
--alpha 0.3 \
--model_path "/path/to/stable-diffusion-3-medium-diffusers"python run_synthesis_flux.py \
--src_prompt "a portrait of a woman in a red dress in a forest, best quality" \
--tgt_prompt "a portrait of a woman in a yellow dress in a forest, best quality" \
--edit_object "dress" \
--out_dir "output" \
--alpha 1.0 \
--model_path "/path/to/FLUX.1-dev"python run_synthesis_cog.py \
--src_prompt "a portrait of a woman in a red dress in a forest, best quality" \
--tgt_prompt "a portrait of a woman in a yellow dress in a forest, best quality" \
--edit_object "dress" \
--out_dir "output" \
--alpha 1.0 \
--model_path "/path/to/CogVideoX-2b"python run_real_sd3.py \
--src_prompt "a girl with a red hat and red t-shirt is sitting in a park, best quality" \
--tgt_prompt "a girl with a yellow hat and red t-shirt is sitting in a park, best quality" \
--edit_object "hat" \
--source_image_path "assets/red_hat_girl.png" \
--out_dir "output" \
--alpha 0.1 \
--model_path "/path/to/stable-diffusion-3-medium-diffusers"What it does: Disables masking and content fusion entirely.
Result: Colors in non-editing regions may change uncontrollably.
python run_synthesis_sd3.py --no_mask --alpha 0.3 ...What it does: Original paper implementation with computational efficiency but inconsistency.
Technical Details:
- Mask Calculation: Uses vanilla attention computation
- Image Generation: Uses scaled dot-product attention computation
Result:
What it does: Our improved method with computational consistency
Technical Details:
- Both Mask & Generation: Use vanilla attention computation
Result: β Optimal background preservation with perfect computational alignment
Default (recommended): New mask method is used automatically.
| Parameter | Type | Default | Description |
|---|---|---|---|
--src_prompt |
str | Required | Source image prompt: Text description used to generate the source image. This defines the initial state before editing. |
--tgt_prompt |
str | Required | Target image prompt: Text description for the edited result. |
--edit_object |
str | Required | Edit object word: Single word or phrase that appears in src_prompt and specifies what object to edit. Used for mask generation. |
--out_dir |
str | "output" |
Output directory: Directory where generated images and masks will be saved. |
--alpha |
float | 1.0 |
Consistency strength: Controls the strength of cross-attention injection (consistency_strength in paper). Range: 0.0-1.0. |
--model_path |
str | Required | Model path: Local path to the diffusion model directory. |
--no_mask |
flag | False |
Disable masking: When set, no mask is generated and no content fusion is applied. Use this to observe uncontrolled changes. |
--use_old_mask |
flag | False |
Use paper method: Enables the original paper's masking approach. Uses scale dot-product attention for generation (less accurate). |
| Parameter | Type | Default | Description |
|---|---|---|---|
--source_image_path |
str | "assets/red_hat_girl.png" |
Input image path |
To generate results for PIE-Bench evaluation:
python run_metric.py \
--model_path "/path/to/stable-diffusion-3-medium-diffusers" \
--data_path "/path/to/pie-bench-dataset"This script processes the PIE-Bench dataset and generates edited images for quantitative evaluation.
To compute evaluation metrics:
python evaluate_sd3.pyConsistEdit_Code/
βββ run_synthesis_sd3.py # SD3 synthesis editing
βββ run_synthesis_flux.py # FLUX synthesis editing
βββ run_synthesis_cog.py # CogVideo editing
βββ run_real_sd3.py # Real image editing
βββ run_metric.py # PIE-Bench evaluation script
βββ evaluate_sd3.py # Metric calculation script
βββ demo_sd3_masking.ipynb # Interactive demonstration
βββ script/
β βββ sd3_consist_edit.sh # Consistent editing demo
β βββ sd3_inconsist_edit.sh # Inconsistent editing demo
βββ consistEdit/
β βββ attention_control.py # Cross-attention mechanisms
β βββ solver.py # Diffusion solvers
β βββ utils.py # Utility functions
β βββ global_var.py # Global variables
βββ evaluation/
β βββ matric_calculator.py # Evaluation metrics
βββ assets/ # Sample images
This codebase is built upon and inspired by several excellent open-source projects:
- MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing
- PnPInversion: Plug-and-Play diffusion features for text-driven image-to-image translation
- UniEdit-Flow: UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models
- DiTCtrl: DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
We thank the authors of these works for their valuable contributions to the diffusion model editing community.
If you find this work useful, please cite our paper:
@inproceedings{yin2025consistedit,
title={ConsistEdit: Highly Consistent and Precise Training-free Visual Editing},
author={Yin, Zixin and Chen, Ling-Hao and Ni, Lionel and Dai, Xili},
booktitle={SIGGRAPH Asia 2025 Conference Papers},
year={2025},
publisher={ACM},
doi={10.1145/3757377.3763909},
address={Hong Kong, China},
isbn={979-8-4007-2137-3/2025/12}
}

