MaskGIT: Masked Generative Image Transformer

Chang, Huiwen; Zhang, Han; Jiang, Lu; Liu, Ce; Freeman, William T.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2202.04200 (cs)

[Submitted on 8 Feb 2022]

Title:MaskGIT: Masked Generative Image Transformer

Authors:Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, William T. Freeman

View PDF

Abstract:Generative transformers have experienced rapid popularity growth in the computer vision community in synthesizing high-fidelity and high-resolution images. The best generative transformer models so far, however, still treat an image naively as a sequence of tokens, and decode an image sequentially following the raster scan ordering (i.e. line-by-line). We find this strategy neither optimal nor efficient. This paper proposes a novel image synthesis paradigm using a bidirectional transformer decoder, which we term MaskGIT. During training, MaskGIT learns to predict randomly masked tokens by attending to tokens in all directions. At inference time, the model begins with generating all tokens of an image simultaneously, and then refines the image iteratively conditioned on the previous generation. Our experiments demonstrate that MaskGIT significantly outperforms the state-of-the-art transformer model on the ImageNet dataset, and accelerates autoregressive decoding by up to 64x. Besides, we illustrate that MaskGIT can be easily extended to various image editing tasks, such as inpainting, extrapolation, and image manipulation.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2202.04200 [cs.CV]
	(or arXiv:2202.04200v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2202.04200

Submission history

From: Huiwen Chang [view email]
[v1] Tue, 8 Feb 2022 23:54:06 UTC (37,390 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MaskGIT: Masked Generative Image Transformer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MaskGIT: Masked Generative Image Transformer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators