You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Given a set of bounding boxes with associated trajectories, our framework enables object and camera motion control in image-to-video generation by leveraging the knowledge present in a pre-trained image-to-video diffusion model. Our method is self-guided, offering zero-shot trajectory control without fine-tuning or relying on external knowledge.
An example command that produces the same result as the notebook is CUDA_VISIBLE_DEVICES=0 python inference.py --input_dir ./examples/111 --output_dir ./output. For convenience, we have provided a shell script, where it generates all the examples by running sh ./inference.sh.
For the input format of examples, please refer to read_condition(input_dir, config) in inference.py for more details. Briefly, each example folder contains the first frame image (img.png) and trajectory conditions (traj.npy), where the trajectory conditions are encoded by the top-left/bottom-right coordinates of each bounding box + positions of its center coordinate across frames.
Reproducing quantitative results
We are currently working on releasing evaluation codes.
✏️ Acknowledgement
Our implementation is partially inspired by DragAnything and FreeTraj. We thank the authors for their open-source contributions.
📖 Citation
If you find our paper and code useful, please cite us:
@article{namekata2024sgi2v,
author = {Namekata, Koichi and Bahmani, Sherwin and Wu, Ziyi and Kant, Yash and Gilitschenski, Igor and Lindell, David B.},
title = {SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation},
journal = {arXiv preprint arXiv:2411.04989},
year = {2024},
}
About
This is the official implementation of SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation.