You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a version of AC3D built on CogVideoX. AC3D is a camera-controlled video generation pipeline that follows the plucker-conditioned ControlNet architecture originally introduced in VD3D.
Installation
Install PyTorch first (we used PyTorch 2.4.0 with CUDA 12.4).
pip install -r requirements.txt
Dataset
Prepare the RealEstate10K dataset following the instructions in CameraCtrl. The dataset path will be used for video_root_dir in the train and inference scripts. This is the folder structure after pre-processing:
The 2B model requires 48 GB memory and the 5B model requires 80 GB memory. Using one node with 8xA100 80 GB should take around 1-2 days for the model to converge.
Training scripts
These are the fine-tuning scripts to train the ControlNet models on top of a pre-trained base model.
This code uses the original CogVideoX model CogVideoX
The data procesing and data loading pipeline builds upon CameraCtrl
Cite
@article{bahmani2024ac3d,
author = {Bahmani, Sherwin and Skorokhodov, Ivan and Qian, Guocheng and Siarohin, Aliaksandr and Menapace, Willi and Tagliasacchi, Andrea and Lindell, David B. and Tulyakov, Sergey},
title = {AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers},
journal = {arXiv preprint arXiv:2411.18673},
year = {2024},
}
About
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers