| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Sat, 13 Dec 2025 01:30:45 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"693cc1c5-f987"
expires: Sun, 28 Dec 2025 19:13:44 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 6D40:318CF6:7DC2F7:8D29DB:69517F10
accept-ranges: bytes
age: 0
date: Sun, 28 Dec 2025 21:16:42 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210067-BOM
x-cache: HIT
x-cache-hits: 0
x-timer: S1766956602.097650,VS0,VE199
vary: Accept-Encoding
x-fastly-request-id: 54d3336f83bfd6132bccccce6279b7586215527b
content-length: 13283
Embodied World Models for Decision Making
Embodied World Models for Decision Making
Embodied World Models for Decision Making
NeurIPS 2025 Workshop, San Diego
Dec 6 (Whole-Day Workshop)
Room: Upper Level Room 30A-E
Schedule Speakers Call For Papers Organizers Assistance Accepted Papers Media Media Partners Contact
Overview
World models infer and predict real-world dynamics by modeling the external environment, and have become a cornerstone of embodied artificial intelligence. They have powered recent progress in decision-making and planning for interacting agents. This workshop aims to bring together researchers working at the intersection of generative modeling, reinforcement learning, computer vision, and robotics to explore the next generation of embodied world models—models that enable agents to understand, predict, and interact with the world through learned models. By focusing on embodiment and decision-making, this workshop seeks to advance world models beyond passive prediction, toward active, goal-driven interaction with the physical and virtual world. By emphasizing embodiment and decision-making, we aim to move beyond passive sequence prediction toward goal-directed interaction with both physical and simulated worlds.
Topics of Interest
We welcome contributions that advance theoretical foundations, algorithmic innovations, or real-world applications of world models. Topics of interest include (but are not limited to):
- Model-based reinforcement learning and long-horizon planning. Investigating how world models can benefit model-based reinforcement learning with a focus on sample efficiency, performance, and scalability. Particular attention is given to long-horizon planning, which requires the agent to reason over extended sequences of actions, anticipate delayed outcomes, and maintain coherent strategies across temporally distant states and goals, often under uncertainty and limited feedback.
- Aligning simulation and real-world physics for robot learning. Investigating how to bridge the gap between simulated and real-world physics to enhance robot learning. This includes using generative models to improve perception, planning, and control by capturing physical dynamics more accurately, modeling uncertainty and feedback effects, and learning diffusion-based policies that transfer robustly from simulation to the real world.
- Interactive scene generation and downstream tasks. Building models that generate physically plausible and semantically coherent interactive video simulations. Focus areas include action-conditioned scene synthesis, controllable simulation of agent-environment dynamics, and the development of evaluation techniques and benchmarks that assess video fidelity, temporal consistency, and task-relevant controllability for downstream applications such as planning and policy learning.
- Video-language-action (VLA) models and leveraging the world knowledge encoded in large language models (LLMs). Studying large-scale pretrained models that unify video, language, and action representations to support robust and generalizable policy learning. Core areas include curating diverse multi-modal datasets, improving cross-modal alignment, developing parameter-efficient fine-tuning methods, and enabling agents to follow complex, language-guided instructions in both simulated and real-world settings. We also explore how the structured and unstructured world knowledge embedded in large language models can be exploited to guide agents’ decision-making.
- Applications in broader domains, such as open-world video games and autonomous driving. Extending world models to embodied agents in both real-world environments and high-fidelity simulators. Key topics include integrating perception with control, sim-to-real transfer, continual learning and adaptation, and deploying agents in open-ended tasks such as Minecraft, autonomous driving, and interactive real-world scenarios.
Call for Papers
We invite submissions of original research papers related to building physically plausible world models.
Submission Types:
- Opinion Papers (max 4 pages with unlimited references, NeurIPS format) - Opinion papers that propose new visions, future directions, or highlight challenges and opportunities in embodied world models for decision-making, without requiring extensive experimental results.
- Research Papers (4 to 9 pages with unlimited references and appendices, NeurIPS format) - For original research contributions.
Submission Guidelines:
- Submit your paper via OpenReview
- Please follow the style guidelines of NeurIPS 2025.
- Papers are non-archival - we welcome submissions that have been submitted to or accepted by other venues.
- Papers submitted to the workshop will be reviewed in a double-blind process.
- For opinion papers, the title should state the opinion and start with "Opinion:", such as "Opinion: Large Language Models Should Not Replace Peer Review in Scientific Publishing".
- Papers already accepted to NeurIPS 2025 will undergo an expedited review process primarily evaluating their relevance to the workshop themes.
- All accepted papers will be presented in a poster session
Important Dates:
-
Submission Deadline: September 1, 2025 11:59PM UTC-0
New Submission Deadline: September 3, 2025 11:59PM UTC-0 -
Notification of Acceptance: September 17, 2025 UTC-0
New Notification of Acceptance: September 20, 2025 UTC-0 - Camera Ready Deadline: October 26, 2025 11:59PM UTC-0
Schedule
- 08:00 - 08:45 Poster Session I.
- 08:45 - 8:50 Opening Remarks.
- 08:50 - 09:30 Keynote #1 Elias Bareinboim (Columbia University) - Towards Causal Artificial Intelligence
- 09:30 - 10:00 Keynote #2 Sanja Fidler (University of Toronto & NVIDIA) - Towards World Models for Autonomous Driving
- 10:00 - 10:30 Keynote #3 Nicklas Hansen (UC San Diego) - Massively Multitask World Models for Continuous Control
- 10:30 - 11:00 Keynote #4 Chelsea Finn (Stanford University & Physical Intelligence) - Developing Long-Term Autonomy
- 11:00 - 11:30 Keynote #5 Peter Stone (The University of Texas at Austin & Sony AI) - Beyond Sim2Real: Leveraging Simulation in Support of Embodied World Models for Real-World Robot Learning
- 11:30 - 12:00 Keynote #6 Glen Berseth (Mila) - Using Foundational Models for Embodied Control
- 12:00 - 13:00 Lunch Break + Poster Session II
- 13:00 - 13:15 Industry Demo Gianluca Corrado (Wayve) & Lorenzo Bertoni (Wayve) - Scaling World Models to Power Evaluation and Validation
- 13:15 - 13:45 Keynote #7 John Langford (Microsoft Research New York) - Next-Latent Prediction Transformers Learn Compact World Models
- 13:45 - 14:15 Keynote #8 Yilun Du (Harvard) - Building Intelligent Robots with World Models
- 14:15 - 14:45 Keynote #9 Philip J. Ball (DeepMind) - Genie 3: A new frontier for world models
-
14:45 - 15:35 Panel Discussion.
Yilun Du (Harvard), Jiajun Wu (Stanford University), John Langford (Microsoft Research New York), Glen Berseth (Mila), Lin Shao (National University of Singapore), Gianluca Corrado (Wayve). - 15:35 - 16:05 Keynote #10 Jianlan Luo (AgiBot) - World Model Powered Robotic Manipulation
- 16:05 - 16:35 Keynote #11 Pablo Samuel Castro (DeepMind & Mila) - Automated Reward Machines via Foundation Models for Compositional Reinforcement Learning
- 16:35 - 16:41 Oral: Sandeep Routray (CMU) - ViPRA: Video Prediction for Robot Actions
- 16:41 - 16:47 Oral: Liliang Chen (AgiBot) - EnerVerse-AC: Envisioning Embodied Environments with Action Condition
- 16:47 - 16:53 Oral: Chongkai Gao (NUS) - VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models
- 16:53 - 16:59 Oral: Shashank Hegde (USC) - Latent Weight Diffusion: Generating reactive policies instead of trajectories
- 16:59 - 17:05 Oral: Chenhao Li (ETH Zurich) - Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics [Video ]
- 17:05 - 17:20 Paper Award & Closing Remarks & Social
Invited Speakers & Panelists
Organizers
Assistance
Accepted Papers
Please check the OpenReview workshop page for full content.
- (Oral) ViPRA: Video Prediction for Robot Actions
Sandeep Routray · Hengkai Pan · Unnat Jain · Shikhar Bahl · Deepak Pathak - (Oral) EnerVerse-AC: Envisioning Embodied Environments with Action Condition
Yuxin Jiang · Shengcong Chen · Siyuan Huang · Liliang Chen · Pengfei Zhou · Yue Liao · Xindong HE · Chiming Liu · Hongsheng Li · Maoqing Yao · Guanghui Ren - (Oral) VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models
Chongkai Gao · Zixuan Liu · Zhenghao Chi · Junshan Huang · Xin Fei · Yiwen Hou · Yuxuan Zhang · Yudi Lin · Zhirul Fang · Zeyu Jiang · Lin Shao - (Oral) Latent Weight Diffusion: Generating reactive policies instead of trajectories
Shashank Hegde · Satyajeet Das · Gautam Salhotra · Gaurav Sukhatme - (Oral) Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics[Video]
Chenhao Li · Andreas Krause · Marco Hutter - Divide and Merge: Motion and Semantic Learning in End-to-End Autonomous Driving
Yinzhe Shen · Omer Sahin Tas · Kaiwen Wang · Royden Wagner · Christoph Stiller - NinA: Normalizing Flows in Action. Training VLA Models with Normalizing Flows
Denis Tarasov · Alexander Nikulin · Ilya Zisman · Albina Klepach · Nikita Lyubaykin · Andrei Polubarov · Alexander Derevyagin · Vladislav Kurenkov - Communicating Plans, Not Percepts: Scalable Multi-Agent Coordination with Embodied World Models
Brennen Hill · Mant Wei · Jishnu Anandh Thangavel - Coupled Distributional Random Expert Distillation for World Model Online Imitation Learning
Shangzhe Li · Zhiao Huang · Hao Su - LLM-Guided Probabilistic Program Induction for POMDP Model Estimation
Aidan Curtis · Hao Tang · Thiago Veloso · Kevin Ellis · Josh Tenenbaum · Tomás Lozano-Pérez · Leslie Kaelbling - Exploring exploration with foundation agents in interactive environments
Daniel Sawyer · Nan Rosemary Ke · Hubert Soyer · Martin Engelcke · John Reid · David Reichert · Drew Hudson · Alexander Lerchner · Danilo Jimenez Rezende · Timothy Lillicrap · Michael Mozer · Jane Wang - Bridging the Sim-to-Real Gap in Humanoid Dynamics via Learned Nonlinear Operators
Jieming Cui · Zhenghao Qi · Yutang Lin · Yifei Zhao · Yuntian Hu · Lei Huang · Shuang Qiu · Rita Zhang · Bin He · Yixin Zhu - FalconWing: An Ultra-Light Fixed-Wing Platform for Indoor Aerial Applications
Yan Miao · Will Shen · Hang Cui · Sayan Mitra - Decoupled Planning and Execution with LLM-Driven World Models for Efficient Task Planning
Guoqing Ma - RDAR: Reward-Driven Agent Relevance Estimation for Autonomous Driving
Carlo Bosio · Greg Woelki · Noureldin Hendy · Nick Roy · Byungsoo Kim - Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making
Fan Feng · Selena Ge · Minghao Fu · Zijian Li · Yujia Zheng · Zeyu Tang · Yingyao Hu · Biwei Huang · Kun Zhang - The Physical Basis of Prediction: World Model Formation in Neural Organoids via an LLM-Generated Curriculum
Brennen Hill - ScenePhys — Controllable Physics Videos for World-Model Evaluation
Arshia Hemmat · Ernad Aghahosseini · Alireza Nasri · Mohammad Hossein Shaker Ardakani · Amirmasoud Rismanchian · Ali Mamanpoosh · Afsaneh Fatemi - A Smooth Sea Never Made a Skilled $\texttt{SAILOR}$: Robust Imitation via Learning to Search
Arnav Kumar Jain · Vibhakar Mohta · Subin Kim · Atiksh Bhardwaj · Juntao Ren · Yunhai Feng · Sanjiban Choudhury · Gokul Swamy - In-Context Policy Iteration for Dynamic Manipulation
Mark Van der Merwe · Devesh Jha - Adversarial Diffusion for Robust Reinforcement Learning
Daniele Foffano · Alessio Russo · Alexandre Proutiere - Learning to Focus: Prioritizing Informative Histories with Structured Attention Mechanisms in Partially Observable Reinforcement Learning
Daniel De Dios Allegue · Jinke He · Frans Oliehoek - PolicyGRID: Acting to Understand, Understanding to Act
Taqiya Ehsan · Shuren Xia · Jorge Ortiz - Sim-to-Real Contact-Rich Pivoting via Optimization-Guided RL with Vision and Touch
Yuki Shirai · Kei Ota · Devesh Jha · Diego Romeres - Opinion: Towards Unified Expressive Policy Optimization for Robust Robot Learning
Haidong Huang · Haiyue Zhu · Jiayi Song · Xixin Zhao · Yaohua Zhou · Jiayi Zhang · Yuze Zhai · Xiaocong Li - Opinion: Learning Intuitive Physics Requires More Than Visual Data
Ellen Su · Solim LeGris · Todd Gureckis · Mengye Ren - Opinion: A Unified World Model is the cornerstone for integrating perception, reasoning, and decision-making in embodied AI
Yipeng Xu - Geosteering Through the Lens of Decision Transformers: Toward Embodied Sequence Decision-Making
Hibat Errahmen DJECTA - Avi: A 3D Vision-Language Action Model Architecture generating Action from Volumetric Inference
Harris Song · Long Le - Hierarchical Task Environments as the Next Frontier for Embodied World Models in Robot Soccer
Brennen Hill - Plan Verification for LLM-Based Embodied Task Completion Agents
Ananth Harharan · Vardhan Dongre · Dilek Tur · Gokhan Tur - WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making
Zhilong Zhang · Ruifeng Chen · Junyin Ye · Yihao Sun · Haoxiang Ren · Xinghua Du · Pengyuan Wang · Jing-Cheng Pang · Kaiyuan Li · Tian-Shuo Liu · Haoxin Lin · Yang Yu · Zhi-Hua Zhou - OpenGVL - Benchmarking Visual Temporal Progress for Data Curation
Pawel Budzianowski · Emilia Wiśnios · Gracjan Góral · Igor Kulakov · Viktor Petrenko · Krzysztof Walas - Foundation Models as World Models: A Foundational Study in Text-Based GridWorlds
Remo Sasso · Michelangelo Conserva · Dominik Jeurissen · Paulo Rauber - Steering Diffusion Policies with Value-Guided Denoising
Hanming Ye - SPUR: Scaling Reward Learning from Human Demonstrations
Anthony Liang · Yigit Korkmaz · Jiahui Zhang · Jesse Zhang · Abrar Anwar · Sid Kaushik · Yufei Wang · Yu Xiang · David Held · Dieter Fox · Abhishek Gupta · Stephen Tu · Erdem Bıyık - Stable Planning through Aligned Representations in Model-Based Reinforcement Learning
Misagh Soltani · Forest Agostinelli - SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards
Hunar Batra · Haoqin Tu · Hardy Chen · Yuanze Lin · Chang Xie · Ronald Clark - How Foundational Skills Influence VLM-based Embodied Agents: A Native Perspective
Bo Peng · Pi Bu · Keyu Pan · Xinrun Xu · Miao Chen · Yang Du · Lin Li · Jun Song · Tong Xu · Bo Zheng - Beyond Experience: Fictive Learning as an Inherent Advantage of World Models
Jianning Chen · Masakazu Taira · Kenji Doya - Opinion: Small VLAs Self-Learn Consistency
Francesco Capuano · Adil Zouitine · Michel Aractingi - HDFlow: Hierarchical Diffusion-Flow Planning for Long-horizon Robotic Assembly
Gireesh Nandiraju · Yuanliang Ju · Chaoyi Xu · He Wang - Mobile Manipulation with Active Inference for Long-Horizon Rearrangement Tasks
Corrado Pezzato · Ozan Catal · Toon Van de Maele · Riddhi Pitliya · Tim Verbelen - Towards Fine-tuning a Small Vision-Language Model for Aerial Navigation
Hakob Tamazyan · Narek Nurijanyan · Boris Martirosyan · Hrant Khachatrian - CRISP: Contact-guided Real2Sim from Monocular Video with Planar Scene Primitives
Zihan Wang · Jiashun Wang · Jeff Tan · Yiwen Zhao · Jessica Hodgins · Shubham Tulsiani · Deva Ramanan - Abstract Sim2Real through Approximate Information States
Yunfu Deng · Josiah Hanna - Opinion: How Can Causal AI Benefit World Models?
Qiuling Pan · Hong Zhou · Zhouchen Lin - ROPES: Robotic Pose Estimation via Score-based Causal Representation Learning
Pranamya Kulkarni · Puranjay Datta · Emre Acartürk · Burak Varıcı· Karthikeyan Shanmugam · Ali Ahmed - FLAM: Scaling Latent Action Models with Factorization
Chang Shi · Zizhao Wang · Jiaheng Hu · Roberto Martín-Martín · Peter Stone - Improvisational Reasoning with Vision-Language Models for Grounded Procedural Planning
Masudur Rahman · Yupeng Zhuo · Juan Wachs - Vision-Language Reasoning for Burn Depth Assessment with Structured Diagnostic Hypotheses
Masudur Rahman · Mohamed Masry · Kristo Nuutila · Gayle Gordillo · Juan Wachs
Media
Sponsor
Contact
For questions about the workshop, please contact us at: