SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces [ACM MM 2023]
Official PyTorch implementation for the paper:
SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces, ACM MM 2023.
Ziqiao Peng, Yihao Luo, Yue Shi, Hao Xu, Xiangyu Zhu, Hongyan Liu, Jun He, Zhaoxin Fan
Arxiv | Project Page | License
Given a speech signal as input, our framework can generate realistic 3D talking faces showing comprehensibility by recovering coherent textual information through the lip-reading interpreter and the speech recognizer.
- Linux
- Python 3.6+
- Pytorch 1.12.1
- CUDA 11.3
- ffmpeg
- MPI-IS/mesh
Clone the repo:
git clone https://github.com/psyai-net/SelfTalk_release.git
cd SelfTalk_releaseCreate conda environment:
conda create -n selftalk python=3.8.8
conda activate selftalk
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txtRequest the VOCASET data from https://voca.is.tue.mpg.de/. Place the downloaded files data_verts.npy, raw_audio_fixed.pkl, templates.pkl and subj_seq_to_idx.pkl in the folder vocaset/. Download "FLAME_sample.ply" from voca and put it in vocaset/. Read the vertices/audio data and convert them to .npy/.wav files stored in vocaset/vertices_npy and vocaset/wav:
cd vocaset
python process_voca_data.py
Follow the BIWI/README.md to preprocess BIWI dataset and put .npy/.wav files into BIWI/vertices_npy and BIWI/wav, and the templates.pkl into BIWI/.
Download the pretrained models from BIWI.pth and vocaset.pth. Put the pretrained models under BIWI and VOCASET folders, respectively. Given the audio signal,
- to animate a mesh in FLAME topology, run:
python demo_voca.py --wav_path "demo/wav/test.wav" --subject FaceTalk_170908_03277_TA - to animate a mesh in BIWI topology, run:
This script will automatically generate the rendered videos in the
python demo_BIWI.py --wav_path "demo/wav/test.wav" --subject M1demo/outputfolder. You can also put your own test audio file (.wav format) under thedemo/wavfolder and specify the argument--wav_path "demo/wav/test.wav"accordingly.
-
Read the vertices/audio data and convert them to .npy/.wav files stored in
vocaset/vertices_npyandvocaset/wav:cd VOCASET python process_voca_data.py
-
To train the model on VOCASET, run:
python main.py --dataset vocaset --vertice_dim 15069 --feature_dim 512 --period 30 --train_subjects "FaceTalk_170728_03272_TA FaceTalk_170904_00128_TA FaceTalk_170725_00137_TA FaceTalk_170915_00223_TA FaceTalk_170811_03274_TA FaceTalk_170913_03279_TA FaceTalk_170904_03276_TA FaceTalk_170912_03278_TA" --val_subjects "FaceTalk_170811_03275_TA FaceTalk_170908_03277_TA" --test_subjects "FaceTalk_170809_00138_TA FaceTalk_170731_00024_TA" -
To test the model on VOCASET, run:
python test.py --dataset vocaset --vertice_dim 15069 --feature_dim 512 --period 30 --max_epoch 100 --train_subjects "FaceTalk_170728_03272_TA FaceTalk_170904_00128_TA FaceTalk_170725_00137_TA FaceTalk_170915_00223_TA FaceTalk_170811_03274_TA FaceTalk_170913_03279_TA FaceTalk_170904_03276_TA FaceTalk_170912_03278_TA" --val_subjects "FaceTalk_170811_03275_TA FaceTalk_170908_03277_TA" --test_subjects "FaceTalk_170809_00138_TA FaceTalk_170731_00024_TA"The results and the trained models will be saved to
vocaset/resultandvocaset/save.
-
To visualize the results, run:
python render.py --dataset vocaset --vertice_dim 15069 --fps 30You can find the outputs in the
vocaset/outputfolder.
- Follow the
BIWI/README.mdto preprocess BIWI dataset.
-
To train the model on BIWI, run:
python main.py --dataset BIWI --vertice_dim 70110 --feature_dim 1024 --period 25 --train_subjects "F2 F3 F4 M3 M4 M5" --val_subjects "F2 F3 F4 M3 M4 M5" --test_subjects "F2 F3 F4 M3 M4 M5" -
To test the model on BIWI, run:
python test.py --dataset BIWI --vertice_dim 70110 --feature_dim 1024 --period 25 --max_epoch 100 --train_subjects "F2 F3 F4 M3 M4 M5" --val_subjects "F2 F3 F4 M3 M4 M5" --test_subjects "F2 F3 F4 M3 M4 M5"The results will be available in the
BIWI/resultfolder. The trained models will be saved in theBIWI/savefolder.
-
To visualize the results, run:
python render.py --dataset BIWI --vertice_dim 70110 --fps 25The rendered videos will be available in the
BIWI/outputfolder.
-
Create the dataset directory
<dataset_dir>inSelfTalk_releasedirectory. -
Place your vertices data (.npy format) and audio data (.wav format) in
<dataset_dir>/vertices_npyand<dataset_dir>/wavfolders, respectively. -
Save the templates of all subjects to a
templates.pklfile and put it in<dataset_dir>, as done for BIWI and vocaset. Export an arbitary template to .ply format and put it in<dataset_dir>/templates/.
-
Create the train, val and test splits by specifying the arguments
--train_subjects,--val_subjectsand--test_subjectsinmain.py. -
Train a SelfTalk model on your own dataset by specifying the arguments
--datasetand--vertice_dim(number of vertices in your mesh * 3) inmain.py. You might need to adjust--feature_dimand--periodto your dataset. Runmain.py. -
The results and models will be saved to
<dataset_dir>/resultand<dataset_dir>/save.
- Specify the arguments
--dataset,--vertice_dimand--fpsinrender.py. Runrender.pyto visualize the results. The rendered videos will be saved to<dataset_dir>/output.
If you find this work useful for your research, please cite our paper:
@inproceedings{peng2023selftalk,
title={SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces},
author={Ziqiao Peng and Yihao Luo and Yue Shi and Hao Xu and Xiangyu Zhu and Hongyan Liu and Jun He and Zhaoxin Fan},
journal={arXiv preprint arXiv:2306.10799},
year={2023}
}
Here are some great resources we benefit:
- Faceformer for pipeline and readme
- CoderTalker for BIWI dataset preprocessing
- FaceXHuBERT for BIWI audio processing
- B3D(AC)2 and VOCASET for dataset
- Wav2Vec2 for audio encoder
- MPI-IS/mesh for mesh processing
- VOCA/rendering for rendering
For research purpose, please contact pengziqiao@ruc.edu.cn
For commercial licensing, please contact fanzhaoxin@psyai.net
This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License. Please read the LICENSE file for more information.
We invite you to join Psyche AI Inc to conduct cutting-edge research and business implementation together. At Psyche AI Inc, we are committed to pushing the boundaries of what's possible in the fields of artificial intelligence and computer vision, especially their applications in avatars. As a member of our team, you will have the opportunity to collaborate with talented individuals, innovate new ideas, and contribute to projects that have a real-world impact.
If you are passionate about working on the forefront of technology and making a difference, we would love to hear from you. Please visit our website at Psyche AI Inc to learn more about us and to apply for open positions. You can also contact us by fanzhaoxin@psyai.net.
Let's shape the future together!!

