| CARVIEW |
Select Language
HTTP/2 301
server: GitHub.com
content-type: text/html
location: https://peterwang512.github.io/CNNDetection/
x-github-request-id: D982:3157C7:7E1D2A:8D8FF0:6951842C
accept-ranges: bytes
age: 0
date: Sun, 28 Dec 2025 19:25:33 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210044-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766949933.916963,VS0,VE198
vary: Accept-Encoding
x-fastly-request-id: 2d040dc09891f1a9eff7f80187eb934d773509e7
content-length: 162
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Mon, 18 Oct 2021 18:46:54 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"616dc11e-4007"
expires: Sun, 28 Dec 2025 19:35:33 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: FFB2:3827E5:7FC9C7:8F508C:6951842C
accept-ranges: bytes
age: 0
date: Sun, 28 Dec 2025 19:25:33 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210044-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766949933.128419,VS0,VE199
vary: Accept-Encoding
x-fastly-request-id: a97ffab350858c9b23bc632b88effdd9f267f127
content-length: 4847
CNN-generated images are surprisingly easy to spot... for now
CNN-generated images are surprisingly easy to spot...for now
Are CNN-generated images hard to distinguish from real images? We show that a classifier trained to detect images generated by only one CNN (ProGAN, far left) can detect those generated by many other models (remaining columns).
In this work we ask whether it is possible to create a ``universal'' detector for telling apart real images from these generated by a CNN, regardless of architecture or dataset used. To test this, we collect a dataset consisting of fake images generated by 11 different CNN-based image generator models, chosen to span the space of commonly used architectures today (ProGAN, StyleGAN, BigGAN, CycleGAN, StarGAN, GauGAN, DeepFakes, cascaded refinement networks, implicit maximum likelihood estimation, second-order attention super-resolution, seeing-in-the-dark). We demonstrate that, with careful pre- and post-processing and data augmentation, a standard image classifier trained on only one specific CNN generator (ProGAN) is able to generalize surprisingly well to unseen architectures, datasets, and training methods (including the just released StyleGAN2). Our findings suggest the intriguing possibility that today's CNN-generated images share some common systematic flaws, preventing them from achieving realistic image synthesis.
Discussion
Despite the alarm that has been raised by the rapidly improving quality of image synthesis methods, our results suggest that today's CNN-generated images retain detectable fingerprints that distinguish them from real photos. This allows forensic classifiers to generalize from one model to another without extensive adaptation.
However, this does not mean that the current situation will persist. Due to the difficulties in achieving Nash equilibria, none of the current GAN-based architectures are optimized to convergence, i.e. the generator never wins against the discriminator. Were this to change, we would suddenly find ourselves in a situation when synthetic images are completely indistinguishable from real ones.
Even with the current techniques, there remain practical reasons for concern. First, even the best forensics detector will have some trade-off between true detection and false-positive rates. Since a malicious user is typically looking to create a single fake image (rather than a distribution of fakes), they could simply hand-pick the fake image which happens to pass the detection threshold. Second, malicious use of fake imagery is likely be deployed on a social media platform (Facebook, Twitter, YouTube, etc.), so the data will undergo a number of often aggressive transformations (compression, resizing, re-sampling, etc.). While we demonstrated robustness to some degree of JPEG compression, blurring, and resizing, much more work is needed to evaluate how well the current detectors can cope with these transformations in-the-wild. Finally, most documented instances of effective deployment of visual fakes to date have been using classic "shallow" methods, such as Photoshop. We have experimented with running our detector on the face-aware liquify dataset from [Wang et al. ICCV 2019], and found that our method performs at chance on this data. This suggests that shallow methods exhibit fundamentally different behavior than deep methods, and should not be neglected.
We note that detecting fake images is just one small piece of the puzzle of how to combat the threat of visual disinformation. Effective solutions will need to incorporate a wide range of strategies, from technical to social to legal.
[GitHub]
|
|
|
|
|
|
|
|
|
|
|
|
Code [GitHub] |
CVPR 2020 (Oral) [Paper] |
|
Are CNN-generated images hard to distinguish from real images? We show that a classifier trained to detect images generated by only one CNN (ProGAN, far left) can detect those generated by many other models (remaining columns).
[Oct 18 2021 Update] Our method gets 92% AUC on the recently released StyleGAN3 model! For more details, please visit this link.
Abstract
Despite the alarm that has been raised by the rapidly improving quality of image synthesis methods, our results suggest that today's CNN-generated images retain detectable fingerprints that distinguish them from real photos. This allows forensic classifiers to generalize from one model to another without extensive adaptation.
However, this does not mean that the current situation will persist. Due to the difficulties in achieving Nash equilibria, none of the current GAN-based architectures are optimized to convergence, i.e. the generator never wins against the discriminator. Were this to change, we would suddenly find ourselves in a situation when synthetic images are completely indistinguishable from real ones.
Even with the current techniques, there remain practical reasons for concern. First, even the best forensics detector will have some trade-off between true detection and false-positive rates. Since a malicious user is typically looking to create a single fake image (rather than a distribution of fakes), they could simply hand-pick the fake image which happens to pass the detection threshold. Second, malicious use of fake imagery is likely be deployed on a social media platform (Facebook, Twitter, YouTube, etc.), so the data will undergo a number of often aggressive transformations (compression, resizing, re-sampling, etc.). While we demonstrated robustness to some degree of JPEG compression, blurring, and resizing, much more work is needed to evaluate how well the current detectors can cope with these transformations in-the-wild. Finally, most documented instances of effective deployment of visual fakes to date have been using classic "shallow" methods, such as Photoshop. We have experimented with running our detector on the face-aware liquify dataset from [Wang et al. ICCV 2019], and found that our method performs at chance on this data. This suggests that shallow methods exhibit fundamentally different behavior than deep methods, and should not be neglected.
We note that detecting fake images is just one small piece of the puzzle of how to combat the threat of visual disinformation. Effective solutions will need to incorporate a wide range of strategies, from technical to social to legal.
Video
|
|
Code and Models
Paper
![]() |
S.-Y. Wang, O. Wang, R. Zhang, A. Owens, A. A. Efros. CNN-generated images are surprisingly easy to spot...for now In CVPR, 2020 (oral presentation). (Paper) |
|
|
Acknowledgements |
