- π₯ FastCache-xDiT
- π QuickStart
- π― Supported DiTs
- π Performance Comparison
- π§ Technical Details
- π xDiT's Parallel Methods
- π Single GPU Acceleration
- π Develop Guide
- π Cite Us
FastCache-xDiT is a novel plug-and-play acceleration method for Diffusion Transformers (DiTs) that exploits computational redundancies across both spatial and temporal dimensions. With zero training required and minimal quality impact, FastCache can deliver significant speedups (up to 1.7x) on modern DiT models while being fully compatible with existing parallel inference methods.
- Plug-and-Play: Drop-in acceleration with no model modifications required
- Adaptive Computation: Dynamically adjusts caching behavior based on model hidden states
- Spatial-Temporal Awareness: Intelligently identifies redundant computations in both dimensions
- Memory Efficient: Reduces peak memory usage by avoiding unnecessary computations
- Compatible with Parallel Methods: Can be combined with USP, PipeFusion, and other xDiT parallel techniques
FastCache introduces a hidden-state-level caching and compression framework with two core components:
- Spatial Token Reduction Module - Adaptively identifies and processes only tokens with significant changes
- Transformer-Level Caching Module - Uses statistical tests to determine when entire transformer blocks can be skipped



FastCache delivers significant speedups across popular DiT models:
Model | Baseline | FastCache | TeaCache | First-Block-Cache |
---|---|---|---|---|
Flux.1 | 9.8s | 6.2s (1.6x) | 7.1s (1.4x) | 7.5s (1.3x) |
PixArt Sigma | 10.6s | 6.7s (1.6x) | 6.9s/(1.6x) | 6.8s/(1.6x) |
FastCache-xDiT operates on two levels, using learnable parameters to approximate redundant computations:
FastCache computes a motion-aware saliency metric by comparing hidden states between timesteps:
Each token is classified as either motion or static based on a threshold
Motion tokens
This spatial token reduction significantly reduces computation by only applying full transformer processing to tokens with significant changes.
For each transformer block, FastCache computes a relative change metric between current and previous hidden states:
Under statistical assumptions, this metric follows a scaled
FastCache applies a cache decision rule: for confidence level
Instead of computing the full transformer block, FastCache uses a block-specific learnable linear projection:
This provides a statistically sound method to decide when hidden states can be reused, while the learnable parameters ensure output quality is maintained.
FastCache includes an adaptive thresholding mechanism that adjusts based on timestep and variance:
Where
Algorithm: FastCache
Input: Hidden state H_t, previous hidden H_{t-1}, Transformer blocks, thresholds Ο_s, Ξ±
Output: Processed hidden state H_t^L
1. Compute token-wise saliency S_t β ||H_t - H_{t-1}||_2^2
2. Partition tokens into motion tokens X_t^m and static tokens X_t^s based on Ο_s
3. Initialize H_{t,0} β Concat(X_t^m, X_t^s)
4. For l = 1 to L:
a. Ξ΄_{t,l} β ||H_{t,l-1} - H_{t-1,l-1}||_F / ||H_{t-1,l-1}||_F
b. If Ξ΄_{t,l}^2 β€ Ο^2_{ND, 1-Ξ±}/ND:
i. H_{t,l} β W_l H_{t,l-1} + b_l (Linear approximation)
c. Else:
i. H_{t,l} β Block_l(H_{t,l-1}) (Full computation)
5. Return H_t^L
This approach provides significant speedups (up to 1.7x) with minimal impact on generation quality by intelligently skipping redundant computations at both the token and transformer block levels.
pip install xfuser # Basic installation
pip install "xfuser[diffusers,flash-attn]" # With both diffusers and flash attention
from xfuser.model_executor.pipelines.fastcache_pipeline import xFuserFastCachePipelineWrapper
from diffusers import PixArtSigmaPipeline
# Load your diffusion model
model = PixArtSigmaPipeline.from_pretrained("PixArt-alpha/PixArt-Sigma-XL-2-1024-MS")
# Create FastCache wrapper
fastcache_wrapper = xFuserFastCachePipelineWrapper(model)
# Enable FastCache with optional parameters
fastcache_wrapper.enable_fastcache(
cache_ratio_threshold=0.05, # Relative change threshold for caching
motion_threshold=0.1, # Threshold for motion saliency
)
# Run inference with FastCache acceleration
result = fastcache_wrapper(
prompt="a photo of an astronaut riding a horse on the moon",
num_inference_steps=30,
)
# Get cache statistics
stats = fastcache_wrapper.get_cache_statistics()
print(stats)
Run FastCache with PixArt Sigma:
# Basic usage
python examples/run_fastcache_test.py \
--model_type pixart \
--model "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS" \
--prompt "a photo of an astronaut riding a horse on the moon" \
--num_inference_steps 30 \
--cache_method "Fast" \
--cache_ratio_threshold 0.05 \
--motion_threshold 0.1
# Using the convenience benchmark script to compare different cache methods
./examples/run_fastcache_benchmark.sh pixart
Run FastCache with Flux model:
# Basic usage
python examples/run_fastcache_test.py \
--model_type flux \
--model "black-forest-labs/FLUX.1-schnell" \
--prompt "a serene landscape with mountains and a lake" \
--num_inference_steps 30 \
--cache_method "Fast" \
--cache_ratio_threshold 0.05 \
--motion_threshold 0.1
# Using the convenience benchmark script
./examples/run_fastcache_benchmark.sh flux
python benchmark/cache_execute.py \
--model_type pixart \
--cache_methods None Fast Fb Tea \
--num_inference_steps 20 \
--height 512 \
--width 512 \
--output_dir cache_results
python benchmark/cache_execution_xfuser.py --model_type pixart --cache_methods Fast --num_inference_steps 5 --height 256 --width 256 --output_dir test_results
Argument | Description | Default |
---|---|---|
--model_type |
Model type (pixart , flux ) |
pixart |
--model |
Model path or name | PixArt-alpha/PixArt-Sigma-XL-2-1024-MS |
--prompt |
Text prompt for image generation | a photo of an astronaut riding a horse on the moon |
--num_inference_steps |
Number of inference steps | 30 |
--cache_method |
Cache method (None , Fast , Fb , Tea ) |
Fast |
--seed |
Random seed | 42 |
--height |
Image height | 768 |
--width |
Image width | 768 |
--cache_ratio_threshold |
Cache ratio threshold | 0.05 |
--motion_threshold |
FastCache motion threshold | 0.1 |
--output_dir |
Output directory for results | fastcache_test_results |
Compare FastCache with other acceleration methods:
# Run on PixArt Sigma
./examples/run_fastcache_benchmark.sh pixart
# Run on Flux model
./examples/run_fastcache_benchmark.sh flux
The benchmark will:
- Run baseline model without acceleration (cache_method="None")
- Run with FastCache acceleration (cache_method="Fast")
- Run with First-Block-Cache acceleration (cache_method="Fb")
- Run with TeaCache acceleration (cache_method="Tea")
- Generate comparison images for quality assessment
- Create performance statistics and cache hit ratio charts
- Generate a comprehensive HTML report with all comparisons
All results will be saved to the fastcache_benchmark_results
directory, making it easy to compare the different caching methods in terms of both performance and output quality.
FastCache can be combined with xDiT's parallel methods for even greater speedups:
# Enable FastCache in your pipeline
fastcache_wrapper = xFuserFastCachePipelineWrapper(model)
fastcache_wrapper.enable_fastcache()
# Apply parallel inference (USP, PipeFusion, etc.)
engine_config, input_config = engine_args.create_config()
paralleler = xDiTParallel(fastcache_wrapper, engine_config, input_config)
# Run inference with both FastCache and parallel acceleration
result = paralleler(prompt="your prompt", num_inference_steps=30)
Model Name | FastCache | CFG | SP | PipeFusion | Performance Report Link |
---|---|---|---|---|---|
π Flux | β | NA | βοΈ | βοΈ | Report |
π΄ PixArt-Sigma | β | βοΈ | βοΈ | βοΈ | Report |
π¬ HunyuanDiT-v1.2-Diffusers | β | βοΈ | βοΈ | βοΈ | Report |
π¬ StepVideo | β | NA | βοΈ | β | Report |
π’ PixArt-alpha | β | βοΈ | βοΈ | βοΈ | Report |
π¬ ConsisID-Preview | β | βοΈ | βοΈ | β | Report |
π¬ CogVideoX1.5 | β | βοΈ | βοΈ | β | Report |
FastCache-xDiT is fully compatible with the parallel acceleration methods provided by xDiT, this repo offers multiple cache-based acceleration methods:
-
FastCache: Our adaptive spatial-temporal caching method that uses motion-aware token reduction and statistical caching to exploit computational redundancies. Read more about FastCache.
-
TeaCache: Memory-friendly caching and generation to exploit redundancies between adjacent denoising steps.
-
First-Block-Cache: Caches the output of early transformer blocks across timesteps.
FastCache also works with video generation models. Here are examples of how to use it with different video DiT models:
# Using FastCache with StepVideo
python benchmark/video_cache_execute.py \
--model_type stepvideo \
--prompt "a dog running in a field" \
--cache_methods Fast \
--num_inference_steps 20 \
--num_frames 16 \
--height 256 \
--width 256 \
--cache_ratio_threshold 0.15
# Using FastCache with CogVideoX
python benchmark/video_cache_execute.py \
--model_type cogvideox \
--prompt "a dog running in a field" \
--cache_methods Fast \
--num_inference_steps 20 \
--num_frames 16
# Using FastCache with ConsisID
python benchmark/video_cache_execute.py \
--model_type consisid \
--prompt "a time lapse of a blooming flower" \
--cache_methods Fast \
--num_inference_steps 20 \
--num_frames 16
# Compare all cache methods (None, Fast, Fb, Tea) on video generation
python benchmark/video_cache_execute.py \
--model_type stepvideo \
--cache_methods All \
--num_frames 16 \
--num_inference_steps 20
We provide a step-by-step guide for adding new models, please refer to the following tutorial.
If you use FastCache-xDiT in your research or applications, please cite:
@inproceedings{liu2025fastcache,
title={FastCache: Cache What Matters, Skip What Doesn't.},
author={Liu, Dong and Zhang, Jiayi and Li, Yifan and Yu, Yanxuan and Lengerich, Ben and Wu, Ying Nian},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
year={2025}
}
@article{liu2025fastcache,
title={FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation},
author={Liu, Dong and Zhang, Jiayi and Li, Yifan and Yu, Yanxuan and Lengerich, Ben and Wu, Ying Nian},
journal={arXiv preprint arXiv:2505.20353},
year={2025}
}
For questions about FastCache-xDiT, please contact dong.liu.dl2367@yale.edu.