| CARVIEW |
Select Language
HTTP/2 301
server: GitHub.com
content-type: text/html
location: https://aggelinacha.github.io/LipNeRF/
access-control-allow-origin: *
strict-transport-security: max-age=31556952
expires: Tue, 30 Dec 2025 05:55:42 GMT
cache-control: max-age=600
x-proxy-cache: MISS
x-github-request-id: 43AE:444BC:9B8664:AE9B26:69536706
accept-ranges: bytes
age: 0
date: Tue, 30 Dec 2025 05:45:42 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210090-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767073543.730238,VS0,VE201
vary: Accept-Encoding
x-fastly-request-id: 994c8343ea011e03055a7b9fe88000440669c275
content-length: 162
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Sun, 16 Nov 2025 17:50:55 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"691a0eff-47bf"
expires: Tue, 30 Dec 2025 05:55:43 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 19A4:3ABDEF:99C9B6:ACE0B5:69536706
accept-ranges: bytes
age: 0
date: Tue, 30 Dec 2025 05:45:43 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210090-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767073543.953557,VS0,VE222
vary: Accept-Encoding
x-fastly-request-id: ad2c1c6d75839c1f04346de0c9620c96e9ac6a90
content-length: 3700
LipNeRF
LipNeRF: What is the right feature space to lip-sync a NeRF?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Synthesizing high-fidelity talking head videos of an arbitrary identity, lip-synced to a target speech segment, is a challenging problem. Recent GAN-based methods succeed by training a model on a large amount of videos, allowing the generator to learn a variety of audio-lip representations. However, they are unable to handle head pose changes. On the other hand, Neural Radiance Fields (NeRFs) model the 3D face geometry more accurately. Current audio-conditioned NeRFs are not as good in lip synchronization as GANs, since they are trained on limited video data of a single identity. In this work, we propose LipNeRF, a lip-syncing NeRF that bridges the gap between the accurate lip synchronization of GAN-based methods and the accurate 3D face modeling of NeRFs. LipNeRF is conditioned on the expression space of a 3DMM, instead of the audio feature space. We experimentally demonstrate that the expression space gives a better representation for the lip shape than the audio feature space. LipNeRF shows a significant improvement in lip-sync quality over the current state-of-the-art, especially in high-definition videos of cinematic content, with challenging pose, illumination and expression variations. |
|
|
|
|
|
Acknowledgements |