HOME
ABOUT
- RESULTS
- differences
- BENEFITS
- HISTORY
- TEAM
- LOCATION
- FACILITIES
- BANKING
- MEMBERSHIPS
- APPROVALS
- LICENCES
- SUPPLIERS
- SPONSORSHIPS
- MEDIA
- PRIVACY
AUCTIONS
SHIPPING
FEES
- TS REWARDS
TOOLS
guides
FAQ
CONTACT
- CONNECT

VEHICLES
BRAND
- JAPANESE CARS
  - DAIHATSU
  - EUNOS
  - FORD
  - HONDA
  - ISUZU
  - LEXUS
  - MAZDA
  - MITSUBISHI
  - MITSUOKA
  - NISSAN
  - SUBARU
  - SUZUKI
  - TOYOTA
- GERMAN CARS
- AMERICAN CARS
- BRITISH CARS
- ITALIAN CARS
- FRENCH CARS
- SWEDISH CARS
- KOREAN CARS
TYPE
- mobility
- VENDING
- instruction
- TAXIS
- AMBULANCES
- FIRE ENGINES
- HEARSES
- LIMOUSINES
- COMMERCIAL
CLASS
FUEL
TRUCKS
minitrucks
- DAIHATSU
- HONDA
- MAZDA
- MITSUBISHI
- NISSAN
- SUBARU
- SUZUKI
- DUMP
- CRANE
- CAMPER
- REFRIGERATED
- 4WD
- NEW
BUSES
MOTORHOMES
- YAHOO!
- RAKUTEN
- DEALER

PARTS
- FREE REPORT
- PARTS CONTAINERS
- PARTS SYSTEMS
- PARTS PROTECTION
- BODY SHELLS
- DISMANTLING
- ONLINE PARTS
- NEW PARTS
- INTERIOR PARTS
- EXTERIOR PARTS
  - BONNETS
  - BUMPERS
  - GRILLES
  - FENDERS
  - DOORS
  - TRUNKS
  - SPOILERS
  - LIGHTS
  - EMBLEMS
  - CAMERAS
- ENGINES
- TRANSMISSIONS
- WHEELS & TYRES
  - WHEELS
  - TYRES
CUTS
PERFORMANCE PARTS
TRUCK PARTS
MOTORBIKE PARTS
- MOTORBIKE ENGINES
- MOTORBIKE ACCESSORIES

MOTORBIKES
MARINE
FORKLIFTS
MACHINERY
AGRICULTURAL
OTHER
COUNTRY
- AUSTRALIA
- CANADA
- KENYA
- MYANMAR
- NEW ZEALAND
- PAKISTAN
- TANZANIA
- UNITED STATES

CARVIEW

MOTORHOMES

Select Language

HTTP/2 301 server: GitHub.com content-type: text/html location: https://fwmb.github.io/blockcaching/ access-control-allow-origin: * strict-transport-security: max-age=31556952 expires: Tue, 30 Dec 2025 09:08:48 GMT cache-control: max-age=600 x-proxy-cache: MISS x-github-request-id: 7DE5:272D88:9E4C69:B1B6CB:69539448 accept-ranges: bytes age: 0 date: Tue, 30 Dec 2025 08:58:48 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210092-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767085129.673510,VS0,VE211 vary: Accept-Encoding x-fastly-request-id: c0ae7cec7a49871c784a3e6a7d28c0777649213a content-length: 162 HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Mon, 08 Dec 2025 16:02:01 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"6936f679-6df4" expires: Tue, 30 Dec 2025 09:08:49 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: C0E0:3FD64F:9E0B95:B179EB:69539448 accept-ranges: bytes age: 0 date: Tue, 30 Dec 2025 08:58:49 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210092-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767085129.913899,VS0,VE211 vary: Accept-Encoding x-fastly-request-id: 9507811c36ae71b7bdbc96efc12f854758f5abb8 content-length: 6993 Cache Me if You Can: Accelerating Diffusion Models through Block Caching

♻️ Cache Me if You Can: Accelerating Diffusion Models through Block Caching

CVPR 2024

Felix Wimbauer^1,2,3, Bichen Wu¹, Edgar Schoenfeld¹, Xiaoliang Dai¹, Ji Hou¹, Zijian He¹, Artsiom Sanakoyeu¹, Peizhao Zhang¹, Sam Tsai¹, Jonas Kohler¹, Christian Rupprecht⁴, Daniel Cremers^2,3, Peter Vajda¹, Jialiang Wang¹

¹Meta GenAI, ²Technical University of Munich, ³MCML, ⁴University of Oxford
Work done during Felix' internship at Meta GenAI

arXiv Video Code

Speeding up diffusion models through block caching. We observe that there are many redundant layer computations at different timesteps in diffusion models when generating an image. Our block caching technique allows us to avoid these unnecessary computations, therefore speeding up inference by a factor of 1.5x-1.8x while maintaining image quality. Compared to the standard practice of naively reducing the number of denoising steps to match our inference speed, our approach produces more detailed and vibrant results.

Abstract

Diffusion models have recently revolutionized the field of image synthesis due to their ability to generate photorealistic images. However, one of the major drawbacks of diffusion models is that the image generation process is costly. A large image-to-image network has to be applied many times to iteratively refine an image from random noise. While many recent works propose techniques to reduce the number of required steps, they generally treat the underlying denoising network as a black box. In this work, we investigate the behavior of the layers within the network and find that 1) the layers' output changes smoothly over time, 2) the layers show distinct patterns of change, and 3) the change from step to step is often very small. We hypothesize that many layer computations in the denoising network are redundant. Leveraging this, we introduce block caching, in which we reuse outputs from layer blocks of previous steps to speed up inference. Furthermore, we propose a technique to automatically determine caching schedules based on each block's changes over timesteps. In our experiments, we show through FID, human evaluation and qualitative analysis that Block Caching allows to generate images with higher visual quality at the same computational cost. We demonstrate this for different state-of-the-art models (LDM and EMU) and solvers (DDIM and DPM).

Method

Analysis

We observe, that in diffusion models, not only the intermediate results x, but also the internal feature maps change smoothly over time. (a) We visualize output feature maps of two layer blocks within the denoising network via PCA. Structures change smoothly at different rates. (b) We also observe this smooth layer-wise change when plotting the change in output from one step to the next, averaging over many different prompts and randomly initialized noise. Besides the average, we also show the standard deviation as shaded area. The patterns always remain the same. (Configuration: LDM-512, DPM, 20 Steps.)

We make three key observations:

Smooth change over time. Similarly to the intermediate images during denoising, the blocks change smoothly and gradually over time. This suggests that there is a clear temporal relation between the outputs of a block.
Distinct patterns of change. The different blocks do not behave uniformly over time. Rather, they apply a lot of change in certain periods of the denoising process, while they remain inactive in others. The standard deviation shows that this behavior is consistent over different images and random seeds. Note that some blocks, for example the blocks at higher resolutions (either very early or very late in the network) change most in the last 20%, while deeper blocks at lower resolutions change more in the beginning.
Small step-to-step difference. Almost every block has significant periods during the denoising process, in which its output only changes very little.

Block Caching

We hypothesize that a lot of layer blocks are performing redundant computations during steps where their outputs change very little. To reduce the amount of redundant computations and to speed up inference, we propose Block Caching.

Automatic Cache Schedule

Not every block should be cached all the time. To make a more informed decision about when and where to cache, we rely on the change metrics visualized above. Our intuition is that for any layer block i, we retain a cached value, which was computed at time step t_a , as long as the accumulated change does not exceed a certain threshold δ. Once the threshold is exceeded at time step t_b , we recompute the block’s output.

Scale Shift Adjustment

To enable the model to adjust to using cached values, we introduce a very lightweight scale-shift adjustment mechanism wherever we apply caching. To this end, we add a timestep-dependent scalar shift and scale parameter for each layer that receives a cached input.

Results

EMU + Caching

Given a fixed computational budget, we can perform more denoising steps and obtain higher-quality results. Here, we compare EMU with our caching approach at 20 steps vs. 14 steps with the default setup. With identical inference speed, our caching technique produces finer details and more vibrant colors.

A magical portal opening to reveal a hidden realm of wonders.

A tranquil garden with cherry blossoms in full bloom under a full moon.

An ancient castle on a cliff overlooking a vast, mist-covered valley.

A yellow tiger with blue stripes.

A time-traveling wizard riding a mechanical steed through a portal, leaving trails of stardust in their wake.

A floating city in the clouds where airships navigate through tunnels of light, and majestic creatures soar in the skies.

Quantitative Results

We conduct a human evaluation study on the visual appeal of images generated with either the configuration with caching or the baseline without caching. We always compare configurations that have the same latency.

LDM + Caching

We show different configurations for the common LDM architecture. The caching configurations at 20 steps and the baseline configuration at 14 steps have the same latency. The baseline with 20 steps is about 1.5x slower. Our method often provides richer colors and finer details. Through our scale-shift adjustment, we avoid artifacts that are visible when naively applying block caching.

Quantitative Results

For different solvers, we test our caching technique against baselines with 1) the same number of steps or 2) the same latency. In all cases, our proposed approach achieves significant speedup while improving visual quality as measured by FID on a COCO subset removing all faces (for privacy reasons). Legend: SS = Scale-shift adjustment, Img/s. = Images per second.

BibTeX

@article{wimbauer2023cache,
  title={Cache Me if You Can: Accelerating Diffusion Models through Block Caching},
  author={Wimbauer, Felix and Wu, Bichen and Schoenfeld, Edgar and Dai, Xiaoliang and Hou, Ji and He, Zijian and Sanakoyeu, Artsiom and Zhang, Peizhao and Tsai, Sam and Kohler, Jonas and others},
  journal={arXiv preprint arXiv:2312.03209},
  year={2023}
}

This page was built using the Academic Project Page Template which was adopted from the Nerfies project page.

HOME
ABOUT
AUCTIONS
SHIPPING
FEES
TOOLS
HOW
FAQ
CONTACT

Original Source | Taken Source