HTTP/2 301
server: GitHub.com
content-type: text/html
location: https://junyanz.github.io/BicycleGAN/
x-github-request-id: A291:3655F2:9F222D:B29BC2:69539B6E
accept-ranges: bytes
age: 0
date: Tue, 30 Dec 2025 09:29:18 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210037-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767086958.432348,VS0,VE200
vary: Accept-Encoding
x-fastly-request-id: 28722ba21208f4c51879af7b51b5a5455b0a96c7
content-length: 162
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Wed, 05 Aug 2020 01:00:29 GMT
access-control-allow-origin: *
etag: W/"5f2a04ad-3d32"
expires: Tue, 30 Dec 2025 09:39:18 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 7B6C:2F7ECD:9E5912:B1D162:69539B6D
accept-ranges: bytes
age: 0
date: Tue, 30 Dec 2025 09:29:18 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210037-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767086959.661593,VS0,VE214
vary: Accept-Encoding
x-fastly-request-id: 79e652928cb44ff10b19cdc8a8c4782762ad2baa
content-length: 4052
Toward Multimodal Image-to-Image Translation
Toward Multimodal Image-to-Image Translation
|
|
1Berkeley Artificial Intelligence Research
|
2Adobe Creative Intelligence Laboratory
|
|
|
Abstract
Many image-to-image translation problems are ambiguous, as a single input image
may correspond to multiple possible outputs. In this work, we aim to model
a distribution of possible outputs in a conditional generative modeling setting.
The ambiguity of the mapping is distilled in a low-dimensional latent vector,
which can be randomly sampled at test time. A generator learns to map the given
input, combined with this latent code, to the output. We explicitly encourage the
connection between output and the latent code to be invertible. This helps prevent
a many-to-one mapping from the latent code to the output during training, also
known as the problem of mode collapse, and produces more diverse results. We
explore several variants of this approach by employing different training objectives,
network architectures, and methods of injecting the latent code. Our proposed
method encourages bijective consistency between the latent encoding and output
modes. We present a systematic comparison of our method and other variants on
both perceptual realism and diversity.
mp4 [258 MB]
Example Results

|
Exploring the Latent Space
Try the BicycleGAN model
Paper
 |
J.Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, E. Shechtman.
Toward Multimodal Image-to-Image Translation.
In NIPS, 2017. (hosted on arXiv)
|
Poster
 |
Related Work
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei. A. Efros. Image-to-image translation with conditional adversarial networks. In CVPR, 2017. [PDF] [Website]
Jun-Yan Zhu*, Taesung Park*, Phillip Isola, Alexei A. Efros. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In ICCV, 2017. [PDF] [Website]
|
Acknowledgements
We thank Phillip Isola and Tinghui Zhou for helpful discussions. This work wassupported in part by Adobe Inc., DARPA, AFRL, DoD MURI award N000141110688, NSF awards IIS-1633310, IIS-1427425, IIS-1212798, the Berkeley Artificial Intelligence Research (BAIR) Lab,and hardware donations from NVIDIA. JYZ is supported by the Facebook Graduate Fellowship, RZ by the Adobe Research Fellowship, and DP by the NVIDIA Graduate Fellowship.
|