AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers

Information

This is a version of AC3D built on CogVideoX. AC3D is a camera-controlled video generation pipeline that follows the plucker-conditioned ControlNet architecture originally introduced in VD3D.

Installation

Install PyTorch first (we used PyTorch 2.4.0 with CUDA 12.4).

pip install -r requirements.txt

Dataset

Prepare the RealEstate10K dataset following the instructions in CameraCtrl. The dataset path will be used for video_root_dir in the train and inference scripts. This is the folder structure after pre-processing:

- RealEstate10k
  - annotations
    - test.json
    - train.json
  - pose_files
    - 0000cc6d8b108390.txt
    - 00028da87cc5a4c4.txt
    - ...
  - video_clips
    - 0000cc6d8b108390.mp4
    - 00028da87cc5a4c4.mp4
    - ...

Pre-trained ControlNet models

AC3D: CogVideoX-2B: Checkpoint

AC3D: CogVideoX-5B: Checkpoint

Inference scripts

AC3D: CogVideoX-2B

bash scripts/inference_2b.sh

AC3D: CogVideoX-5B

bash scripts/inference_5b.sh

Training requirements

The 2B model requires 48 GB memory and the 5B model requires 80 GB memory. Using one node with 8xA100 80 GB should take around 1-2 days for the model to converge.

Training scripts

These are the fine-tuning scripts to train the ControlNet models on top of a pre-trained base model.

AC3D: CogVideoX-2B

bash scripts/train_2b.sh

AC3D: CogVideoX-5B

bash scripts/train_5b.sh

Acknowledgements

This code mainly builds upon CogVideoX-ControlNet
This code uses the original CogVideoX model CogVideoX
The data procesing and data loading pipeline builds upon CameraCtrl

Cite

@article{bahmani2024ac3d,
  author = {Bahmani, Sherwin and Skorokhodov, Ivan and Qian, Guocheng and Siarohin, Aliaksandr and Menapace, Willi and Tagliasacchi, Andrea and Lindell, David B. and Tulyakov, Sergey},
  title = {AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers},
  journal = {arXiv preprint arXiv:2411.18673},
  year = {2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
data		data
inference		inference
scripts		scripts
training		training
LICENSE		LICENSE
README.md		README.md
cogvideo_controlnet.py		cogvideo_controlnet.py
cogvideo_transformer.py		cogvideo_transformer.py
controlnet_img2vid_pipeline.py		controlnet_img2vid_pipeline.py
controlnet_pipeline.py		controlnet_pipeline.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers

Information

Installation

Dataset

Pre-trained ControlNet models

Inference scripts

Training requirements

Training scripts

Acknowledgements

Cite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

snap-research/ac3d

Folders and files

Latest commit

History

Repository files navigation

AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers

Information

Installation

Dataset

Pre-trained ControlNet models

Inference scripts

Training requirements

Training scripts

Acknowledgements

Cite

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages