Carview!

SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation

Koichi Namekata¹ · Sherwin Bahmani^1,2 · Ziyi Wu^1,2 · Yash Kant^1,2 · Igor Gilitschenski^1,2 · David B. Lindell^1,2

¹University of Toronto · ²Vector Institute

💡 TL;DR

Given a set of bounding boxes with associated trajectories, our framework enables object and camera motion control in image-to-video generation by leveraging the knowledge present in a pre-trained image-to-video diffusion model. Our method is self-guided, offering zero-shot trajectory control without fine-tuning or relying on external knowledge.

🔧 Setup

The code has been tested on:

Ubuntu 22.04.5 LTS, Python 3.12.4, CUDA 12.4, NVIDIA RTX A6000 48GB

Repository

# clone the github repo
git clone https://github.com/Kmcode1/SG-I2V.git
cd SG-I2V

Installation

Create a conda environment and install PyTorch:

conda create -n sgi2v python=3.12.4
conda activate sgi2v
conda install pytorch=2.3.1 torchvision=0.18.1 pytorch-cuda=11.8 -c pytorch -c nvidia

Install packages:

pip install -r requirements.txt

🖌️ Usage

Quick start with a notebook

You can run demo.ipynb, which contains all the implementations (along with a light explanation) of our pipeline.

Reproducing qualitative results

Alternatively, you can generate example videos demonstrated on the project website by running:

python inference.py --input_dir <input_path> --output_dir <output_path>

An example command that produces the same result as the notebook is CUDA_VISIBLE_DEVICES=0 python inference.py --input_dir ./examples/111 --output_dir ./output. For convenience, we have provided a shell script, where it generates all the examples by running sh ./inference.sh.

For the input format of examples, please refer to read_condition(input_dir, config) in inference.py for more details. Briefly, each example folder contains the first frame image (img.png) and trajectory conditions (traj.npy), where the trajectory conditions are encoded by the top-left/bottom-right coordinates of each bounding box + positions of its center coordinate across frames.

Reproducing quantitative results

We are currently working on releasing evaluation codes.

✏️ Acknowledgement

Our implementation is partially inspired by DragAnything and FreeTraj. We thank the authors for their open-source contributions.

📖 Citation

If you find our paper and code useful, please cite us:

@article{namekata2024sgi2v,
  author = {Namekata, Koichi and Bahmani, Sherwin and Wu, Ziyi and Kant, Yash and Gilitschenski, Igor and Lindell, David B.},
  title = {SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation},
  journal = {arXiv preprint arXiv:2411.04989},
  year = {2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
examples		examples
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
inference.py		inference.py
inference.sh		inference.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation

💡 TL;DR

🔧 Setup

Repository

Installation

🖌️ Usage

Quick start with a notebook

Reproducing qualitative results

Reproducing quantitative results

✏️ Acknowledgement

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Kmcode1/SG-I2V

Folders and files

Latest commit

History

Repository files navigation

SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation

💡 TL;DR

🔧 Setup

Repository

Installation

🖌️ Usage

Quick start with a notebook

Reproducing qualitative results

Reproducing quantitative results

✏️ Acknowledgement

📖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages