This repository provides an overview of all resources for the paper "DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance".
Draft Attention is a plug-and-play acceleration method for video diffusion transformers.
Draft Attention reshapes long queries and keys into frame-wise feature maps and applying 2D average pooling to downsample them.
Draft Attention provides the reference for the sparse attention in full length.
Draft Attention introduces minimal overhead by compressing the number of tokens 128x or larger.
- [2025/05] We support HunyuanCustom with classifier free guidance.
![]() Dense Attention |
![]() Sparse Video Generation (SVG) |
![]() Draft Attention (Ours) |
Prompt:
"The banks of the Thames, as the camera moves vertically from low to high."
![]() Dense Attention |
![]() Sparse Video Generation (SVG) |
![]() Draft Attention (Ours) |
Prompt:
"On the green grass, the white-walled Leaning Tower of Pisa stands tall. The camera moves vertically from top to bottom during filming."
![]() Dense Attention |
![]() Sparse Video Generation (SVG) |
![]() Draft Attention (Ours) |
Prompt:
"A blue long dress fell from the balcony clothes rack and dropped into the water on the ground."
Prompts are all from the Penguin Video Benchmark.
Videos are generated with sparsity 90%, seed 42, using Hunyuan model in 768p on A100 GPU.
![]() Input Image |
![]() Dense Attention |
![]() Draft Attention (Ours) |
Prompt:
"Realistic, High-quality. A woman is drinking coffee at a café."
Videos are generated with seed 42 in 768p resolution on 8xA100 GPUs, with either dense attention or 90% sparse attention.
Please follow the instruction of environment setup and download the checkpoint from HunyuanVideo, Wan2.1, and HunyuanCustom.
We mainly adopt the block sparse attention for draft attention.
Simply run video generation with scripts in hunyuan/
, wan/
or hunyuan_custom/
.
Evaluation results in the paper are mainly achieved with VBench on Penguin Video Benchmark using HunyuanVideo and Wan2.1.
You can simply use the draft attention similar as the flash attention through the Draft_Attention
defined in draft_attention.py
or draft_attention_classifier_free_guidance.py
.
Here is the example for hunyuan model:
from draft_attention import Draft_Attention
draft_attention = Draft_Attention(
pool_h=8,
pool_w=16,
latent_h=48,
latent_w=80,
visual_len=126_720,
text_len=256,
sparsity_ratio=0.9,
)
x = draft_attention(
q,
k,
v,
attn_mask=attn_mask,
causal=causal,
drop_rate=drop_rate,
cu_seqlens_q=cu_seqlens_q,
cu_seqlens_kv=cu_seqlens_kv,
max_seqlen_q=max_seqlen_q,
max_seqlen_kv=max_seqlen_kv,
batch_size=batch_size,
)
- Support any-resolution video generation with padding.
- Support reordering of further block sparse grouping for faster hardware execution.
This work is mainly contributed by Xuan and Chenxia.
If you find Draft Attention is interesting, please cite through BibTeX:
@article{shen2025draft,
title={DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance},
author={Shen, Xuan and Han, Chenxia and Zhou, Yufa and Xie, Yanyue and Gong, Yifan and Wang, Quanyi and Wang, Yiwei and Wang, Yanzhi and Zhao, Pu and Gu, Jiuxiang},
journal={arXiv preprint arXiv:2505.14708},
year={2025}
}