🎬 KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation

Official repository for **KeyVID**, presented in **“KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation.”** This work introduces a unified diffusion framework that generates temporally coherent videos conditioned on audio, guided by adaptive keyframe localization.

📦 Release Plan

cd motion_scores/network
python main.py --mode predict

Keyframe generator =======

Keyframe Localization — Coming soon
Keyframe Generation — Released
Interpolation Model — Released
Training Code — Coming soon
Checkpoints

⚙️ Environment Setup

We recommend using Python 3.10+ and PyTorch ≥ 2.1.

# Clone the repository
git clone https://github.com/XingruiWang/KeyVID.git
cd KeyVID
# Create environment
conda create -n keyvid python=3.10
conda activate keyvid
# Install dependencies
pip install -r requirements.txt

🚀 Inference

1️⃣ Keyframe Localization

Detect audio-synchronized keyframes:

bash scripts/run_ASVA_evaluation.sh asva_12_kf

2️⃣ Keyframe Generation

Generate keyframes aligned with localized timestamps:

bash scripts/run_ASVA_evaluation.sh asva_12_kf

Configuration example:

config="configs/inference_512_asva_12_keyframe_new_add_idx.yaml"
exp_root="${save_root}/ver_add_idx_add_fps/keyframes"
checkpoint="checkpoints/keyframe_generation/best_checkpoint.ckpt"

3️⃣ Interpolation

Generate smooth video transitions between keyframes:

bash scripts/run_ASVA_evaluation.sh asva_12_kf_interp

Configuration example:

config="configs/inference_512_asva_12_keyframe_kf_freenoise.yaml"
exp_root="${save_root}/ver_add_idx_add_fps/interpolation/"
checkpoint="checkpoints/interpolation/best_checkpoint.ckpt"

📈 Evaluation

Quantitative evaluation (e.g., FID, FVD, AlignSync, RelSync) scripts will be added soon.
You can also visualize the output videos in the outputs/ directory for qualitative comparison.

📚 Citation

If you find this project useful, please cite:

@article{wang2025keyvid,
  title={KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation},
  author={Wang, Xingrui and Liu, Jiang and Wang, Ze and Yu, Xiaodong and Wu, Jialian and Sun, Ximeng and Su, Yusheng and Yuille, Alan and Liu, Zicheng and Barsoum, Emad},
  journal={arXiv preprint arXiv:2504.09656},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
asva		asva
avgen		avgen
configs		configs
imagebind		imagebind
lvdm		lvdm
main		main
motion_scores		motion_scores
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
freeze_layers.txt		freeze_layers.txt
requirements.txt		requirements.txt
resave_video.py		resave_video.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎬 KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation

📦 Release Plan

⚙️ Environment Setup

🚀 Inference

1️⃣ Keyframe Localization

2️⃣ Keyframe Generation

3️⃣ Interpolation

📈 Evaluation

📚 Citation

About

Uh oh!

Releases

Packages

Languages

License

XingruiWang/KeyVID

Folders and files

Latest commit

History

Repository files navigation

🎬 KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation

📦 Release Plan

⚙️ Environment Setup

🚀 Inference

1️⃣ Keyframe Localization

2️⃣ Keyframe Generation

3️⃣ Interpolation

📈 Evaluation

📚 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages