You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository contains a single-file, reference implementation of the following publication:
On Learning Associations of Faces and Voices
Changil Kim, Hijung Valentina Shin, Tae-Hyun Oh, Alexandre Kaspar, Mohamed Elgharib, Wojciech Matusik ACCV 2018 Paper | ArXiv | Project Website
Please cite the above paper if you use this software. See the project website for more information about the paper.
Requirements
The software runs with Python 2 or 3, and TensorFlow r1.4 or later. Additionally, it requires NumPy, SciPy, and scikit-image packages.
Pre-trained models
Two pre-trained models are provided as TensorFlow checkpoints.
Download pre-trained models and unzip them. Prepare input facial images and voice files: facial images must be JPEG or PNG color images, and audio files must be WAV audio files sampled at 22,050 hz.
Depending on the reference modality, run one of the following two commands. Make sure you specify the correct checkpoint matching the reference modality.
Given a voice, find the matching face from two candidates (v2f):