Welcome to the repository for Ultra-low Bitrate Video Compression Using Deep Animation Models. This codebase implements methods and models described in cutting-edge research on low-bitrate video conferencing and animation-based video compression. The repository is designed to serve researchers and developers interested in leveraging deep learning for video compression.
This repository accompanies the following papers:
- Ultra-Low Bitrate Video Conferencing Using Deep Image Animation
- A Hybrid Deep Animation Codec for Low-Bitrate Video Conferencing
- Improving Reconstruction Fidelity in Generative Face Video Coding Using High-Frequency Shuttling
- Predictive Coding for Animation-Based Video Compression
- Improved Predictive Coding for Animation-Based Video Compression
- Multi-Reference Generative Face Video Compression with Contrastive Learning
This repository supports Python 3. To set up the environment, clone this repository and install the required dependencies:
pip install -r requirements.txt
The YAML configuration files are used to define the settings for training and testing the models. Example files are located in the train/test config
directory:
[train/test]_config/dac.yaml
[train/test]_config/hdac.yaml
[train/test]_config/rdac.yaml
During inference, use the --mode test
flag with the same configuration file after updating the eval_params
section appropriately.
- VoxCeleb: Follow the instructions in the video-preprocessing repository to prepare the dataset.
- Creating Your Own Videos: Ensure that input videos are cropped to focus on the speaker’s face at a resolution of 256x256 pixels. (Support for higher resolutions is under development.)
- Pre-processed Videos (256x256 px): Pre-processed videos are available for download from our Google Drive link. Place these videos in the following folders:
datasets/train
datasets/inference
Our metrics module incorporates suggestions from JPEG-AI alongside popular quantitative metrics used in computer vision. Supported metrics include:
psnr
,psnr-hvs
,fsim
,iw_ssim
,ms_ssim
vif
,nlpd
,vmaf
,lpips
,msVGG
To train a model, update the relevant parameters in the corresponding train_config/[MODEL_NAME].yaml
file or use the default configuration (to reproduce our results). Run the following command:
bash training_script.sh [MODEL_NAME]
Note: The default setup requires 2 x A40 GPUs. Adjust the batch size in the configuration file if using a different hardware setup.
To test a model, update the eval_params
in the corresponding test_config/[MODEL_NAME].yaml
file and run:
bash test_script.sh [MODEL_NAME]
Refer to JVET_AH0114 and subsequent documentation and Reference software for CTC implementations and benchmark evaluation against other GFVC frameworks.
This codebase includes components adapted from the following projects:
- First Order Motion Model for Image Animation: For the base architecture of deep image animation using unsupervised keypoints.
- CompressAI: For learned image compression.
- JPEG-AI: For evaluation metrics.
For any questions, feedback, or collaboration opportunities, feel free to contact the maintainers or open an issue in this repository.
We appreciate the contributions of the research community that enabled this work. If you use this repository or find it helpful, please consider citing the relevant papers.
If you find this project useful, give it a star on GitHub to support further development!