You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Code and pretrained DiffiT models will be released soon !
DiffiT achieves a new SOTA FID score of 1.73 on ImageNet-256 dataset !
In addition, DiffiT sets a new SOTA FID score of 2.22 on FFHQ-64 dataset !
We introduce a new Time-dependent Multihead Self-Attention (TMSA) mechanism that jointly learns spatial and temporal dependencies and allows for attention conditioning with finegrained control.
💥 News 💥
[07.01.2024] 🔥🔥 DiffiT has been accepted to ECCV 2024 !
[04.02.2024] Updated manuscript now available on arXiv !
[12.04.2023] 🔥 Paper is published on arXiv !
Benchmarks
Latent Space
ImageNet-256
Model
Dataset
Resolution
FID-50K
Inception Score
Latent DiffiT
ImageNet
256x256
1.73
276.49
ImageNet-512
Model
Dataset
Resolution
FID-50K
Inception Score
Latent DiffiT
ImageNet
512x512
2.67
252.12
Image Space
Model
Dataset
Resolution
FID-50K
DiffiT
CIFAR-10
32x32
1.95
DiffiT
FFHQ-64
64x64
2.22
Citation
@inproceedings{hatamizadeh2025diffit,
title={Diffit: Diffusion vision transformers for image generation},
author={Hatamizadeh, Ali and Song, Jiaming and Liu, Guilin and Kautz, Jan and Vahdat, Arash},
booktitle={European Conference on Computer Vision},
pages={37--55},
year={2025},
organization={Springer}
}