You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1Tsinghua University, 2Snap Inc., 3The University of British Columbia
TL;DR: A highly accurate and efficient, model-agnostic, training and tuning-free sampling strategy for inversion and editing tasks. Support text-driven image 🎨 (FLUX, Stable Diffusion 3, Stable Diffusion XL, etc.) and video 🎥 (Wan, flow-based video generation model) editing.
💜 Overview
In this work, we introduce a predictor-corrector-based framework for inversion and editing in flow models.
First, we propose Uni-Inv, an effective inversion method designed for accurate reconstruction.
Building on this, we extend the concept of delayed injection to flow models and introduce Uni-Edit, a region-aware, robust image editing approach.
Our methodology is tuning-free, model-agnostic, efficient, and effective, enabling diverse edits while ensuring strong preservation of edit-irrelevant regions.
✨ Feature: Text-driven Image / Video Editing
More results can be found in our project page.
🎨 Image Editing
Editing Prompt
Source Image
FLUX
Stable Diffusion 3
Stable Diffusion XL
A longshort haired cat with blue eyes looking up at something.
Two origami birds sitting on a branch.
A clown in pixel art style with colorful hair.
🎥 Video Editing
Editing Prompt
Source Video
Wan + Uni-Edit
A young rider wearing full protective gear, including a black helmet and motocross-style outfit, is navigating a BMX bikemotorcycle over a series of sandy dirt bumps on a track enclosed by a fence...
A koalacat with thick gray fur is captured mid-motion as it reaches out with its front paws to climb or move between tree branches, surrounded by lush green leaves and dappled sunlight in a forested area.
👨💻 Implementation
Here we provide two implementation options:
Implementation by diffusers: Support FLUX (e.g., black-forest-labs/FLUX.1-dev), Stable Diffusion 3 (e.g., stabilityai/stable-diffusion-3-medium), Stable Diffusion XL (e.g., SG161222/RealVisXL_V4.0), etc., for text-driven image editing tasks. As well, support Wan (e.g., Wan-AI/Wan2.1-T2V-1.3B-Diffusers) for text-driven video editing tasks.
We sincerely thank FireFlow, RF-Solver, and FLUX for their awesome work!
Additionally, we would also like to thank PnpInversion for providing comprehensive baseline survey and implementations, as well as their great benchmark.
📑 Cite Us
If you like our work, you can cite our paper through the bibtex below. Thank for your attention!
@misc{jiao2025unieditflowunleashinginversionediting,
title={UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models},
author={Guanlong Jiao and Biqing Huang and Kuan-Chieh Wang and Renjie Liao},
year={2025},
eprint={2504.13109},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.13109},
}
About
UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models