| CARVIEW |
Structured World Models
for Robotic Manipulation
Email: swomo-rss25@googlegroups.com
Room 124, Seeley G. Mudd Building (SGM)
Livestream: Zoom Link
Important Notice for Participants & Speakers
Please arrive early for registration before proceeding to the workshop venue. This is especially important for speakers and presenters scheduled early in the program.
Registration & Badge Pick-Up Schedule:
- 7:30am–10am: Epstein Family Plaza
- 11am–8pm: Bovard Auditorium
- Note: Registration desk closed 10am–11am
Overview
Physics-based models have been crucial for manipulation, enabling sim-to-real learning, model-predictive control, manipulation planning, and model-based design and verification. However, they typically require extensive manual effort and often fail to capture real-world complexity. Advances in generative modeling—particularly video models—offer a data-driven alternative but struggle with physical plausibility, consistency, and action conditioning. A promising direction is to integrate structured priors with scalable data-driven methods to improve dynamics prediction and generalization across diverse scenarios.
This workshop will explore relevant timely key topics, including state-action representations, supervision sources, generalizable inductive biases, the role of (generative) simulation and video models, and trade-offs in downstream planning, control, policy learning and evaluation.
We will bring together researchers from robotics, machine learning, and computer vision. The workshop targets audiences in manipulation, world modeling, reinforcement learning, and sim-to-real learning. Posters, panels, and live polls will foster debate and cross-level dialogue, allowing attendees to actively contribute to discussions.
Discussion Topics
- What should be the representation of state and action?
-
Where should the supervision (i.e., training data) come from?
- How do we deal with noisy training data, and how do we handle highly occluded scenarios where ground truth states are hard to access?
- What is the place of simulated data?
- Is explicit 3D modeling essential? What are the limitations of end-to-end approaches?
- What inductive biases should be incorporated into the model, and what are the trade-offs in terms of scalability and generalization?
- Is photometric reconstruction necessary? Is it synergetic? If so, how?
- What granularity should the model operate on, in terms of both space and time?
- What are the pros and cons of leveraging existing foundation models, like video diffusion models?
- How do we learn/acquire models for efficient downstream planning or policy learning?
- What modalities should be incorporated as inputs?
- Can a world model evolve and be learned live during interactions?
- What parts of the world are relevant to the model, and how much does accurate dynamics modeling matter?
Invited Speakers
Chuang Gan
UMass Amherst | MIT-IBM
Rares Ambrus
Toyota Research Institute
Katerina Fragkiadaki
Carnegie Mellon UniversityEvent Schedule
| 8:00 - 8:05 | Opening Remarks |
| 8:05 - 8:25 | Paper Oral Presentations |
| 8:25 - 8:50 | Jerome Revaud: The *3R family: a Foundation Model for 3D vision |
| 8:50 - 9:15 | Katerina Fragkiadaki: From Explicit Physics Engines to Neural Simulators with Generative Models |
| 9:15 - 9:40 | Jonathan Tremblay: BUILDING BRIDGES, A KID's DREAM |
| 9:40 - 10:05 | Chuang Gan: Virtual Community: A World Simulator for Humans, Robots, and Society |
| 10:05 - 10:30 | Yunzhu Li: Simulating and Manipulating Deformable Objects with Structured World Models |
| 10:30 - 11:30 | Coffee Break & Poster Sessions |
| 11:30 - 11:55 | Rares Ambrus: Structured Large Behavior Models for Dexterous Manipulation |
| 11:55 - 12:25 | Panel Discussion |
| 12:25 - 12:30 | Awards & Closing Remarks |
Accepted Papers
- [Oral] FLARE: Robot Learning with Implicit World Modeling
Ruijie Zheng, Jing Wang, Scott Reed, Johan Bjorck, Yu Fang, Fengyuan Hu, Joel Jang, Kaushil Kundalia, Zongyu Lin, Loïc Magne, Avnish Narayan, You Liang Tan, Guanzhi Wang, Qi Wang, Jiannan Xiang, Yinzhe Xu, Seonghyeon Ye, Jan Kautz, Furong Huang, Yuke Zhu, Linxi Fan - [Oral] One-Shot Real-to-Sim via End-to-End Differentiable Simulation and Rendering
Yifan Zhu, Aaron Dollar, Zherong Pan - [Oral] WoMAP: World Models For Embodied Open-Vocabulary Object Localization
Tenny Yin, Zhiting Mei, Tao Sun, Lihan Zha, Miyu Yamane, Emily Zhou, Jeremy Bao, Ola Sho, Anirudha Majumdar - [Oral] DiWA: Diffusion Policy Adaptation with World Models
Akshay L Chandra, Iman Nematollahi, Chenguang Huang, Tim Welschehold, Abhinav Valada - DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation
Jiangran Lyu, Ziming Li, Xuesong Shi, Chaoyi Xu, Yizhou Wang, He Wang - Multi-Objective Photoreal Simulation (MOPS) Dataset for Computer Vision in Robotic Manipulation
Maximilian Xiling Li, Paul Mattes, Nils Blank, Korbinian Franz Rudolf, Paul Werner Lödige, Rudolf Lioutikov - DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies
Tony Tao, Mohan Kumar Srirama, Jason Jingzhou Liu, Kenneth Shaw, Deepak Pathak - GenParticles: Probabilistic Particle-Based Modeling for Object-Centric Motion
Arijit Dasgupta, Eric Li, Mathieu Huot, William T. Freeman, Vikash Mansinghka, Joshua B. Tenenbaum - Fusing vision and contact-rich physics improves object reconstruction under occlusion
Bibit Bianchini, Minghan Zhu, Mengti Sun, Bowen Jiang, Camillo Jose Taylor, Michael Posa - Diffusion Dynamics Models with Generative State Estimation for Cloth Manipulation
Tongxuan Tian, Haoyang Li, Bo Ai, Xiaodi Yuan, Zhiao Huang, Hao Su - Task-Oriented Grasping, Training-Free, Retrieval, Semantic Alignment, Generative examples
Shailesh, Alok Raj, Nayan Kumar, Priya Shukla, Andrew Melnik, Michael Beetz, Gora Chand Nandi - Learning Dexterous Deformable Object Manipulation Through Cross-Embodiment Dynamics Learning
Zihao He, Bo Ai, Yulin Liu, Weikang Wan, Henrik I Christensen, Hao Su - ZeroMimic: Distilling Robotic Manipulation Skills from Web Videos
Junyao Shi, Zhuolun Zhao, Tianyou Wang, Ian Pedroza, Amy Luo, Jie Wang, Yecheng Jason Ma, Dinesh Jayaraman - Phys2Real: Physically-Informed Gaussian Splatting for Adaptive Sim-to-Real Transfer in Robotic Manipulation
Maggie Wang, Stephen Tian, Jiajun Wu, Mac Schwager
Call for Papers
Submission Portal: OpenReview
We cordially invite paper submissions relevant to the following (non-exhaustive) topics:
- Structured Priors for World Modeling
- Applications of World Models for Robotic Manipulation
- World Models for Policy Learning, Evaluation, and Verification
- Model-Based Planning, Control, and Reinforcement Learning
- State-Action Representation in World Modeling
- Video Models for Robotic Manipulation
- 3D/4D Reconstruction for Robotic Manipulation
- Generative Simulation for Robotic Manipulation
- Adaptation and Generalization of World Models
- Evaluation and Benchmarking of World Models
- Uncertainty and Robustness in World Modeling
Submission Guidelines
- Deadline: 11:59pm AOE on May 31, 2025
- Page Limit: We welcome submissions of up to 4 pages, with an unlimited number of pages for references and appendices.
- Formatting: The authors are encouraged to use the RSS templates for their submissions. Templates from related conferences will also be accepted.
- Reviews: Authors from each submitted paper are expected to provide up to 3 reviews for other papers submitted to this workshop.
- Anonymity: We follow the double-blind review policy.
- Dual Submission: We welcome submissions that are under review or recently accepted for other workshops and/or conferences.
- Non-Archival: This workshop is non-archival. Accepted papers will be posted on the workshop website and OpenReview.
- Poster Presentation: Authors of accepted papers are expected to present their work in person at the workshop poster session.
Organizers
Wenlong Huang
Stanford University
Jad Abou-Chakra
Queensland University of Technology
Alberta Longhini
KTH Royal Institute of Technology
Kaifeng Zhang
Columbia University
Bardienus Pieter Duisterhof
Carnegie Mellon University




