| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Thu, 16 Nov 2023 09:40:10 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"6555e37a-ab25"
expires: Mon, 29 Dec 2025 02:10:35 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: CF7F:1F53DD:829FA1:92C62F:6951E0C3
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 02:00:35 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210087-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766973635.488669,VS0,VE204
vary: Accept-Encoding
x-fastly-request-id: 6b8af74f668872407f4adfe2a9da74dfe665f600
content-length: 4874
Latent Consistency Models: Synthesizing High-Resolution Images with Few-step Inference
Latent Consistency Models
Synthesizing High-Resolution Images with Few-step Inference
Institute
for Interdisciplinary
Information Sciences,
Tsinghua University
" LCMs: The next generation of generative models after Latent Diffusion Models (LDMs). "
Abstract
We propose Latent Consistency Models (LCMs) to overcome the slow iterative
sampling process of Latent Diffusion models (LDMs), enabling fast inference with minimal steps
on any pre-trained LDMs (e.g Stable Diffusion).
Viewing the guided reverse diffusion process as solving an augmented probability flow
ODE (PF-ODE) , LCMs predict its solution directly in latent space, achieving super
fast inference with few steps.
A high-quality 768x768 LCM, distilled from Stable Diffusion, requires only 32 A100 GPU
training hours (8 node for only 4 hours) for 2~4-step inference.
Few-Step Generated Images
Images generated by Latent Consistency Models (LCMs). LCMs can be distilled from any
pre-trained Stable Diffusion (SD) in only 4,000 training steps (~32 A100 GPU Hours) for
generating high quality 768 x 768 resolution images in 2~4 steps or even one step ,
significantly accelerating text-to-image generation. We employ LCM to distill the
Dreamshaper-V7 version of SD in just 4,000 training iterations.
4-Step Inference
2-Step Inference
1-Step Inference
More Generation Results (4-Steps)
More generated images results with LCM 4-Step inference (768 x
768
Resolution). We employ
LCM to distill the Dreamshaper-V7 version of SD in just 4,000 training iterations.
More Generation Results (2-Steps)
More generated images results with LCM 2-Step inference (768 x
768
Resolution). We employ
LCM to distill the Dreamshaper-V7 version of SD in just 4,000 training iterations.
Latent Consistency Fine-tuning (LCF)
LCF is a fine-tuning method designed for pretrained LCM. LCF enables efficient few-step inference on customized datasets without teacher diffusion model, presenting a viable alternative to directly finetune a pretrained LCM.
4-step LCMs using Latent Consistency Fine-tuning (LCF) on two customized datasets: Pokemon Dataset (left), Simpsons Dataset (right). Through LCF, LCM produces images with customized styles.
BibTeX
@misc{luo2023latent,
title={Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference},
author={Simian Luo and Yiqin Tan and Longbo Huang and Jian Li and Hang Zhao},
year={2023},
eprint={2310.04378},
archivePrefix={arXiv},
primaryClass={cs.CV}
}