You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[WACV 2025] Follow-Your-Handle: This repo is the official implementation of "MagicStick: Controllable Video Editing via Control Handle Transformations"
TL; DR: MagicStick is the first unified framework to modify video properties(e.g., shape, size, location, motion) leveraging the keyframe transformations on the extracted internal control signals.
CLICK for the full abstract
Text-based video editing has recently attracted considerable interest in changing the style or replacing the objects with
a similar structure. Beyond this, we demonstrate that properties such as shape, size, location, motion, etc., can also be
edited in videos. Our key insight is that the keyframe’s transformations of the specific internal feature (e.g., edge maps
of objects or human pose), can easily propagate to other frames to provide generation guidance. We thus propose MagicStick,
a controllable video editing method that edits the video properties by utilizing the transformation on the extracted internal
control signals. In detail, to keep the appearance, we inflate both the pretrained image diffusion model and ControlNet to
the temporal dimension and train low-rank adaptions (LORA) layers to fit the specific scenes. Then, in editing, we perform
an inversion and editing framework. Differently, finetuned ControlNet is introduced in both inversion and generation for
attention guidance with the proposed attention remix between the spatial attention maps of inversion and editing.
Yet succinct, our method is the first method to show the ability of video property editing from the pre-trained text-to-image model.
We present experiments on numerous examples within our unified framework. We also compare with shape-aware text-based editing
and handcrafted motion video generation, demonstrating our superior temporal consistency and editing capability than previous works.
📋 Changelog
2023.12.01 Release Code and Paper!
🚧 Todo
Release the edit config and data for all results, Tune-a-video optimization
Memory and runtime profiling and Editing guidance documents
Colab and hugging-face
code refactoring
time & memory optimization
Release more application
Object Size Editing
We show the difference between the source prompt and the target prompt in the box below each video.
Note mp4 and gif files in this GitHub page are compressed.
Please check our Project Page for mp4 files of original video editing results.
Object Position Editing
Object Appearance Editing
"Truck ➜ Bus"
"Truck ➜ Train"
"A swan ➜ A flamingo"
"A swan ➜ A duck"
📀 Demo Video
demo.mp4
📍 Citation
If you think this project is helpful, please feel free to leave a star⭐️⭐️⭐️ and cite our paper:
@article{ma2023magicstick,
title={MagicStick: Controllable Video Editing via Control Handle Transformations},
author={Ma, Yue and Cun, Xiaodong and He, Yingqing and Qi, Chenyang and Wang, Xintao and Shan, Ying and Li, Xiu and Chen, Qifeng},
year={2023},
journal={arXiv:2312.03047},
}
💗 Acknowledgements
This repository borrows heavily from FateZero and FollowYourPose. Thanks to the authors for sharing their code and models.
🧿 Maintenance
This is the codebase for our research work. We are still working hard to update this repo, and more details are coming in days. If you have any questions or ideas to discuss, feel free to contact Yue Ma or Xiaodong Cun.
About
[WACV 2025] Follow-Your-Handle: This repo is the official implementation of "MagicStick: Controllable Video Editing via Control Handle Transformations"