Carview!

CARVIEW

MOTORHOMES

Select Language

HTTP/2 301 server: GitHub.com content-type: text/html location: https://snap-research.github.io/dfm/ x-github-request-id: C347:2685F2:8FAC63:A14F90:6952A70D accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 16:06:37 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210068-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767024397.125789,VS0,VE199 vary: Accept-Encoding x-fastly-request-id: 767323ac4779895686dcdc66c38866d5c95f2f1a content-length: 162 HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Fri, 19 Dec 2025 18:20:55 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"69459787-3cf4" expires: Mon, 29 Dec 2025 16:16:37 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 9D32:36A0B4:90F5DA:A29A45:6952A70C accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 16:06:37 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210068-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767024397.353654,VS0,VE218 vary: Accept-Encoding x-fastly-request-id: 49e40f8407c1cd0638d3ab5153ac17a576a5d8b9 content-length: 4592 DFM

Moayed Haji-Ali* Willi Menapace* Ivan Skorokhodov Arpit Sahni Sergey Tulyakov Vicente Ordonez Aliaksandr Siarohin

Snap Research

TL;DR: Decomposable Flow Matching (DFM) is a simple framework to progressively generate visual modalities scale-by-scale, achieving up to 50% faster convergence compared to Flow Matching. Read the paper on

for more details.

Method

Decomposable Flow Matching (DFM): A generative model combining multiscale decomposition with Flow Matching. DFM progressively synthesizes different representation scales by generating coarse-structure scale first and incrementally refining it with finer scales.

DFM Architecture: Our framework (DFM) progressively synthesizes images by combining multiscale decomposition with Flow Matching. We modify the DiT architecture to use per-scale patchification and timestep-embedding layers while keeping the core DiT architecture untouched.

Results

Across image and video generation, DFM outperforms the best-performing baselines, achieving the same Fréchet DINO Distance (FDD) of Flow Matching baselines with up to 2x less training compute.

Qualitative Results

Large-Scale Finetuning: Finetuning FLUX-dev with DFM (FLUX-DFM) achieves superior results than finetuning with standard full-finetuning (DFM-FT) for the same training compute.

Training From Scratch for Image Generation: When trained from scratch on ImageNet-1k 512px, DFM achieves better quality than baselines using the same training resources.

Training From Scratch for Video Generation: DFM is also suited for video generation, achieving better structural and visual quality than baselines when trained on the Kinetics-700 dataset with the same compute budget.

Ablations: We found that DFM benefits from more sampling steps in the coarse-structure stage and needs only a few in the high-frequency stage, and it stays largely insensitive to the choice of sampling per-stage noise threshold, especially at high CFG values.

Citation

If you find this paper useful in your research, please consider citing our work:

@article{dfm,
title={Improving Progressive Generation with Decomposable Flow Matching},
author={Moayed Haji-Ali and Willi Menapace and Ivan Skorokhodov and Arpit Sahni and Sergey Tulyakov and 
  Vicente Ordonez and Aliaksandr Siarohin},
journal={arXiv preprint arXiv:2506.19839}
year={2025}}

Original Source | Taken Source