News

Nov 6th, 2025: New preprint on BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning! Proud to conduct the majority of real-world experiments and skill composition with teammates! Check it out on arXiv, project page, and GitHub.

Oct 29th, 2025: New preprint on πRL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models! Check it out on arXiv, GitHub, and HuggingFace.

Oct 24th, 2025: New blog post: Understanding the Architecture of Flow VLAs (π0 and π0.5). Check it out here.

Oct 19th, 2025: My blog on the Fokker-Planck equation is online! Check it out on here.

Sep 19th, 2025: ReinFlow is accepted at NeurIPS 2025. Huge thanks to all the collaborators!

Aug 11th, 2025: Started my new research journey at CMU RI.

Jun 26th, 2025: ReinFlow is now online! This work demonstrates how to fine-tune flow policies with reinforcement learning. Check it out on arXiv, HuggingFace, GitHub, and WandB.

Talks

Jun 16th, 2025: Delivered a talk at Tsinghua University on “Designing Robot Learning Pipelines with Flow Policies and Reinforcement Learning.”

Research

* Denotes equal contribution. $ Denotes core contributors. Representative works are highlighted.

Preprints

  • BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning
    Yitang Li*$, Zhengyi Luo*$, Tonghe Zhang$, Cunxi Dai$, Anssi Kanervisto, Andrea Tirinzoni, Haoyang Weng, Kris Kitani, Mateusz Guzek, Ahmed Touati, Alessandro Lazaric, Matteo Pirotta, Guanya Shi.
    Preprint.
    TL;DR: Building behavior foundation model for multitask humanoid loco-manipulation via unsupervised sim2real RL.
  • Pi RL Icon
    πRL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models
    Kang Chen*, Zhihao Liu*, Tonghe Zhang*, Zhen Guo, Si Xu, Hao Lin, Hongzhi Zang, Quanlu Zhang, Zhaofei Yu, Guoliang Fan, Tiejun Huang, Yu Wang, Chao Yu.
    Preprint.
    TL;DR: Online RL fine-tuning for flow-based VLAs (π0, π0.5) with policy gradient, achieving near perfect success rate in LIBERO and 90%+ success rate in ManiSkill via heterogeneous parallel sim across 320 envs.
  • SAC Flow
    Sample-Efficient Reinforcement Learning of Flow Policies via Sequence Modeling
    Yixian Zhang*, Shu'ang Yu*, Tonghe Zhang, Mo Guang, Haojia Hui, Kaiwen Long, Yu Wang, Chao Yu, Wenbo Ding.
    Preprint.
    TL;DR: Training flow matching policies via off-policy RL by treating the process as RNN/Transformer.

Publications

  • ReinFlow Icon
    ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning
    Tonghe Zhang, Chao Yu, Sichang Su, Yu Wang.
    The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025).
    TL;DR: Injecting learnable noise to fine-tune flow matching policies and VLAs with policy gradient.