You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We introduce a novel scheme, named Learning-to-Cache (L2C), that learns to conduct caching in a dynamic manner for diffusion transformers. A router is optimized to decide the layers to be cached.
(Changes in the router for U-ViT when optimizing across different layers (x-axis) over all steps (y-axis). The white indicates the layer is activated, while the black indicates it is disabled.)
Some takeaways:
A large proportion of layers in the diffusion transformer can be removed, without updating the model parameters.
In U-ViT-H/2, up to 93.68% of the layers in the cache steps (46.84% for all steps) can be removed, with less than 0.01 drop in FID.
L2C largely outperforms samplers such as DDIM and DPM-Solver.
(Comparison with Baselines. Left: DiT-XL/2. Right: U-ViT-H/2)
@misc{ma2024learningtocache,
title={Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching},
author={Xinyin Ma and Gongfan Fang and Michael Bi Mi and Xinchao Wang},
year={2024},
eprint={2406.01733},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
About
[NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching