CARVIEW

MOTORHOMES

Select Language

HTTP/2 301 server: GitHub.com content-type: text/html location: https://sjtuplayer.github.io/projects/MotionMaster/ access-control-allow-origin: * strict-transport-security: max-age=31556952 expires: Sun, 28 Dec 2025 13:51:40 GMT cache-control: max-age=600 x-proxy-cache: MISS x-github-request-id: BA82:2680BD:7A1BFE:88F2DD:69513394 accept-ranges: bytes age: 0 date: Sun, 28 Dec 2025 13:41:40 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210056-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766929301.506274,VS0,VE224 vary: Accept-Encoding x-fastly-request-id: 77fad0193ed9ad2069a1a901f45635cb8971be1e content-length: 162 HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Tue, 16 Dec 2025 03:40:08 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"6940d498-b8c4" expires: Sun, 28 Dec 2025 13:51:40 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: D705:2F7ECD:7AB8BD:898FD4:69513394 accept-ranges: bytes age: 0 date: Sun, 28 Dec 2025 13:41:40 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210056-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766929301.743549,VS0,VE234 vary: Accept-Encoding x-fastly-request-id: c83f5e2fa3bee29397f93afacfd39fadfbb00467 content-length: 7090 MotionMaster: Training-free Camera Motion Transfer For Video Generation

MotionMaster: Training-free Camera Motion Transfer For Video Generation

Teng Hu^1*, Jiangning Zhang^2*, Ran Yi^1#, Yating Wang¹, Hongrui Huang³,
Jieyu Weng¹, Yabiao Wang², Lizhuang Ma¹

¹Shanghai Jiao Tong University, Shanghai, China
²Youtu Lab, Tencent, Shanghai, China
³Harbin Institute of Technology, Harbin, China

Paper arXiv Code

Camera Motion Control

Abstract

The emergence of diffusion models has greatly propelled the progress in image and video generation. Recently, some efforts have been made in controllable video generation, including text-to-video, image-to-video generation, video editing, and video motion control, among which camera motion control is an important topic. However, existing camera motion control methods rely on training a temporal camera module, and necessitate substantial computation resources due to the large amount of parameters in video generation models. Moreover, existing methods pre-define camera motion types during training, which limits their flexibility in camera control, preventing the realization of some specific camera controls, such as various camera movements in films. Therefore, to reduce training costs and achieve flexible camera control, we propose MotionMaster, a novel training-free video motion transfer model, which disentangles camera motions and object motions in source videos, and transfers the extracted camera motions to new videos. We first propose a one-shot camera motion disentanglement method to extract camera motion from a single source video, which separates the moving objects from the background and estimates the camera motion in the moving objects region based on the motion in the background by solving a Poisson equation. Furthermore, we propose a few-shot camera motion disentanglement method to extract the common camera motion from multiple videos with similar camera motions, which employs a window-based clustering technique to extract the common features in temporal attention maps of multiple videos. Finally, we propose a motion combination method to combine different types of camera motions together, enabling our model a more controllable and flexible camera control. Extensive experiments demonstrate that our training-free approach can effectively decouple camera-object motion and apply the decoupled camera motion to a wide range of controllable video generation tasks, achieving flexible and diverse camera motion control.

Our demo is divided into two parts: firstly, all our experimental results, and secondly, we provide a gallery to showcase various effects.

Source Video & Swap Attention Map

We find that the temporal attention maps determine the generated video motions. By switching the temporal attention maps between the two videos in the first row, we have the results in the second row, whose videos motions are totally swithed.