| CARVIEW |
Select Language
HTTP/2 301
server: GitHub.com
content-type: text/html
location: https://cfeng16.github.io/audio-visual-forensics/
x-github-request-id: E67F:21D6A4:7D4EE9:8C5558:69514C08
accept-ranges: bytes
age: 0
date: Sun, 28 Dec 2025 15:26:05 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210047-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766935565.126657,VS0,VE201
vary: Accept-Encoding
x-fastly-request-id: d8017ce1d55eddbe54aaea28fe37016ac3146455
content-length: 162
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Fri, 08 Dec 2023 20:25:14 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"65737baa-297d"
expires: Sun, 28 Dec 2025 15:36:05 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 61EE:1387E:7B6A5D:8A71EC:69514C0C
accept-ranges: bytes
age: 0
date: Sun, 28 Dec 2025 15:26:05 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210047-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766935565.341448,VS0,VE215
vary: Accept-Encoding
x-fastly-request-id: a14495f6f5e35bbc6d63b9788600d807ee973446
content-length: 2555
Audio-Visual Forensics
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Manipulated videos often contain subtle inconsistencies between their visual and audio signals. We propose a video forensics method, based on anomaly detection, that can identify these inconsistencies, and that can be trained solely using real, unlabeled data. We train an autoregressive model to generate sequences of audio-visual features, using feature sets that capture the temporal synchronization between video frames and sound. At test time, we then flag videos that the model assigns low probability. Despite being trained entirely on real videos, our model obtains strong performance on the task of detecting manipulated speech videos. |
|
|
Time Delay Visualization |
|
Video Demo |
|
|
|
|
![]() |
Chao Feng, Ziyang Chen, Andrew Owens. Self-Supervised Video Forensics by Audio-Visual Anomaly Detection. CVPR 2023. (ArXiv) |
Acknowledgements
We thank David Fouhey, Richard Higgins, Sarah Jabbour, Yuexi Du, Mandela Patrick, Deva Ramanan, Haochen Wang, and Aayush Bansal for helpful discussions. This work was supported in part by DARPA Semafor and Cisco Systems. The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government. The webpage template was originally made by Phillip Isola and Richard Zhang for a Colorization project.
|
|
