| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Thu, 04 Dec 2025 15:57:01 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"6931af4d-60f1"
expires: Sun, 28 Dec 2025 12:20:46 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: A27B:234FE9:7A612D:89104F:69511E45
accept-ranges: bytes
age: 0
date: Sun, 28 Dec 2025 12:10:46 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210028-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766923847.711696,VS0,VE203
vary: Accept-Encoding
x-fastly-request-id: 84daf49f2c652e02155680ac9cd558fb5be11c13
content-length: 6042
Edit-by-Track
Generative Video Motion Editing with 3D Point Tracks
Generative Video Motion Editing with 3D Point Tracks
Yao-Chih Lee1,3  
Zhoutong Zhang2  
Jiahui Huang1  
Jui-Hsien Wang1  
Joon-Young Lee1  
Jia-Bin Huang3   Eli Shechtman1   Zhengqi Li1  
Jia-Bin Huang3   Eli Shechtman1   Zhengqi Li1  
1Adobe Research   
2Adobe   
3University of Maryland College Park
I. Joint Camera & Object Motion Editing
II. Shape Deformation
III. Object Removal & Duplication
IV. Handling Partial Track Inputs
Baseline Comparisons
Our Edit-by-Track Framework
Given a video, we first estimate camera poses and 3D tracks using off-the-shelf models.
Users then edit the estimated poses and 3D tracks to specify the desired camera and object motions.
We project both the original (source) and edited (target) 3D tracks into 2D screen coordinates using their respective camera parameters, aligning them with the video frames. These projected 3D tracks provide sparse correspondences, guiding our model to transfer visual context from the source video onto the target motion.
Our model builds on a pretrained text-to-video generation model, further fine-tuned with LoRAs and an additional 3D track conditioner for precise motion control. To preserve the original visual context, we encode the input source video into source video tokens and concatenate them with noisy target video tokens. The 3D track conditioner transforms the projected 3D tracks into paired track tokens, which are added the corresponding video tokens to guide the motion editing (see our paper for details).
We project both the original (source) and edited (target) 3D tracks into 2D screen coordinates using their respective camera parameters, aligning them with the video frames. These projected 3D tracks provide sparse correspondences, guiding our model to transfer visual context from the source video onto the target motion.
Our model builds on a pretrained text-to-video generation model, further fine-tuned with LoRAs and an additional 3D track conditioner for precise motion control. To preserve the original visual context, we encode the input source video into source video tokens and concatenate them with noisy target video tokens. The 3D track conditioner transforms the projected 3D tracks into paired track tokens, which are added the corresponding video tokens to guide the motion editing (see our paper for details).
Training Data
3D Control: Depth Order and Occlusion Handling
Model Analysis
Failure Cases
References
Motion-controlled image-to-video (I2V) generation methods
- Xiao et al. Trajectory Attention For Fine-grained Video Motion Control. ICLR, 2025.
- Geng et al. Motion Prompting: Controlling Video Generation with Motion Trajectories. CVPR, 2025.
- Wang et al. LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis. CVPR, 2025.
- Jeong et al. Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation. CVPR, 2025.
- Zhang et al. Tora: Trajectory-oriented Diffusion Transformer for Video Generation. CVPR, 2025.
- Burgert et al. Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise. CVPR, 2025.
- Gu et al. Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control. SIGGRAPH, 2025.
- Xing et al. MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation. SIGGRAPH, 2025.
- Chen et al. Perception-as-control: Fine-grained controllable image animation with 3d-aware motion representation. In ICCV, 2025.
- Wang et al. ATI: Any Trajectory Instruction for Controllable Video Generation. arXiv preprint arXiv:2505.22944, 2025.
- Ren et al. GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control. CVPR, 2025.
- Yu et al. TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models. ICCV, 2025.
- Bai et al. ReCamMaster: Camera-Controlled Generative Rendering from A Single Video. ICCV, 2025.
- Jeong et al. Reangle-A-Video: 4D Video Generation as Video-to-Video Translation ICCV, 2025.
- Muo et al. ReVideo: Remake a Video with Motion and Content Control. NeurIPS, 2024.
- Ma et al. MagicStick: Controllable Video Editing via Control Handle Transformations. WACV, 2025.
- Liu et al. Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy. SIGGRAPH Asia, 2025.
A concurrent work by Burgert et al., MotionV2V, also explores video motion editing utilizing 2D point tracks. Their primary focus is on employing few point tracks for more user-friendly editing.
We also thank the authors of Wan2.1, DiffSynth-Studio, SpatialTrackerV2, and TAPIP3D for open-sourcing their codes and models.
Societal Impact
We recognize that powerful video editing tools, including ours, may raise ethical considerations depending on context.
While the work aims to augment human creativity and professional workflows, such capabilities could potentially be misused.
We encourage responsible use aligned with community guidelines and maintain transparency about the editing applied.
Acknowledgements
We are grateful for the valuable feedback and insightful discussions provided by
Yihong Sun, Linyi Jin, Yiran Xu, Quynh Phung, Dekel Galor, Chun-Hao Paul Huang, Tianyu (Steve) Wang, Ilya Chugunov, Jiawen Chen, Marc Levoy,
Wei-Chiu Ma, Ting-Hsuan Liao, Hadi Alzayer, Yi-Ting Chen, Vinayak Gupta, Yu-Hsiang Huang, and Shu-Jung Han.
BibTeX
@article{lee2025editbytrack,
author = {Lee, Yao-Chih and Zhang, Zhoutong and Huang, Jiahui and Wang, Jui-Hsien and Lee, Joon-Young and Huang, Jia-Bin and Shechtman, Eli and Li, Zhengqi},
title = {Generative Video Motion Editing with 3D Point Tracks},
journal = {arXiv preprint arXiv:2512.02015},
year = {2025},
}