You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Building on the success of Dream 7B, we introduce Dream-VL and Dream-VLA, open VL and VLA models that fully unlock discrete diffusion’s advantages in long-horizon planning, bidirectional reasoning, and parallel action generation for multimodal tasks.
Key Results:
Dream-VL: Achieves state-of-the-art performance among diffusion VLMs, comparable to top-tier AR VLMs trained on open data, with superior performance on visual planning tasks requiring long-horizon reasoning.
Dream-VLA: Establishes top-tier performance with 97.2% average on LIBERO, 71.4% on SimplerEnv–Bridge, and 60.5% on SimplerEnv–Fractal, surpassing leading models including GR00T-N1 and OpenVLA-OFT. Consistently outperforms AR baselines across diverse finetuning objectives.
The exact structure may evolve; please refer to the repo for up-to-date details.
Dream-VLX/
├── Dream-VL/ # Dream-VL training and evaluation (preparing)
├── Dream-VLA/ # Dream-VLA training and evaluation (preparing)
└── README.md # This file
Citation
@article{ye2025dreamvla,
title={Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone},
author={Ye, Jiacheng and Gong, Shansan and Gao, Jiahui and Fan, Junming and Wu, Shuang and Bi, Wei and Bai, Haoli and Shang, Lifeng and Kong, Lingpeng},
journal={arXiv preprint},
year={2025}
}
About
Dream-VL and Dream-VLA, a diffusion VLM and a diffusion VLA.