| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Tue, 27 May 2025 04:10:11 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"68353b23-b72b"
expires: Sun, 28 Dec 2025 18:27:29 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: D216:15317B:7E77B7:8DCC17:69517438
accept-ranges: bytes
age: 0
date: Sun, 28 Dec 2025 18:17:29 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210027-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766945849.282307,VS0,VE224
vary: Accept-Encoding
x-fastly-request-id: fd313cce0fe976962eed5e75537517a7818c5abb
content-length: 8136
GenBot
Generative Models for Robot Learning
ICLR 2025 Workshop, Singapore
April 27th, 2025, from 9:00 AM to 3:25 PM
📢 Agenda Updated on April 27th, 2025! Please check the latest schedule below.
Best Paper Award
Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
Next generation of robots should combine ideas from other fields such as computer vision, natural language processing, machine learning and many others, because the close-loop system is required to deal with complex tasks based on multimodal input in the complicated real environment. This workshop focuses on generative models for robot learning, which lies in the important and fundamental field of AI and robotics.
Our topics include but are not limited to:
- Robotics data generation. (i) How can we build simulators with diverse assets with rich interactive properties? And how can we accurately simulate physical consequences for diverse actions of robots? (ii) How can we accelerate the generation process for successful trajectory in the simulation environments? (iii) What are the challenges and possible solutions to alleviate the visual domain gap between the simulators and the real world?
- Generative policy learning. (i) How can we design a generative visual representation learning framework that effectively embeds spatiotemporal information of the scene via self-supervision? (ii) How can we efficiently construct world model for scalable robot learning, and what information of the scene and the robot should be considered in order to acquire accurate feedback from the world model? (iii) How can we extend state-of-the-art generative models such as diffusion models in computer vision and auto-regressive models in natural language processing for policy generation?
- Foundation model grounding. (i) What are the general criteria for designing prompts of LLMs for robot tasks? (ii) How can we build a scalable, efficient and generalizable representation of physical scenes to ground the action prediction of VLMs? (iii) How can we enhance the sample efficiency in VLA model training, and how can we efficiently adapt pre-trained VLA models to novel robot tasks?
- On-device generative model deployment. (i) What is the complexity bottleneck in current pre-trained large generative models, and how can we distinguish and remove the redundant architectures? (ii) How can we dynamically keep the optimal accuracy-efficiency trade-off to adapt to the changing resource limit caused by battery level and utilization variance? (iii) How can we develop the compilation toolbox for pre-trained large generative models on robot-based computational platforms to achieve significant actual speedup and memory saving?
Keynote Speakers
Schedule
| Opening Remarks and Welcome | 09:00-09:05 |
| Invited Talk: Xiaojuan Qi | 09:05-09:45 |
| Invited Talk: Sergey Levine | 09:45-10:25 |
| Coffee Break | 10:25-10:40 |
| Invited Talk: Shuran Song | 10:40-11:20 |
| Oral Session
[11:20-11:30]
Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
[11:30-11:40]
Latent Action Pretraining from Videos
[11:40-11:50]
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies
|
11:20-11:50 |
| Lunch Break | 11:50-12:20 |
| Poster Session | 12:20-13:20 |
| Invited Talk: Daquan Zhou | 13:20-14:00 |
| Invited Talk: Yilun Du | 14:00-14:40 |
| Invited Talk: Qi Dou | 14:40-15:20 |
| Closing Remarks | 15:20-15:25 |
Accepted Papers
|
AVID: Adapting Video Diffusion Models to World Models
Marc Rigter, Tarun Gupta, Agrin Hilmkil, Chao Ma
|
|
|
Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling
Yuejiang Liu, Jubayer Hamid, Yoonho Lee, Annie Xie, Maximilian Du, Chelsea Finn
|
|
|
Contrastive Initial State Buffer for Reinforcement Learning
Nico Messikommer, Yunlong Song, Davide Scaramuzza
|
|
|
DemoGen: Synthetic Demonstration Generation for Data-Efficient Visuomotor Policy Learning
Zhengrong Xue, Shuying Deng, Zhenyang Chen, Yixuan Wang, Yixuan Wang, Huazhe Xu
|
|
|
DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References
Xueyi Liu, Jianibieke Adalibieke, Qianwei Han, Yuzhe Qin, Li Yi
|
|
|
Diffusion Model Predictive Control
Guangyao Zhou, Sivaramakrishnan Swaminathan, Rajkumar Vasudeva Raju, J. Swaroop Guntupalli, Wolfgang Lehrach, Joseph Ortiz, Antoine Dedieu, Miguel Lázaro-Gredilla, Kevin Murphy
|
|
|
Environment as Policy: Generative Curriculum for Autonomous Racing
Jiaxu Xing, Hongze Wang, Nico Messikommer, Davide Scaramuzza
|
|
|
EQM-MPD Equivariant On-Manifold Motion Planning Diffusion
Evangelos Chatzipantazis, Nishanth Arun Rao, Kostas Daniilidis
|
|
|
ET-Plan-Bench: Embodied Task-level Planning Benchmark Towards Spatial-Temporal Cognition with Foundation Models
Lingfeng Zhang, Yuening Wang, Hongjian Gu, Atia Hamidizadeh, Zhanguang Zhang, Yuecheng Liu, Yutong Wang, David Gamaliel Bravo, Junyi Dong, Shunbo Zhou, Tongtong Cao, Yuzheng Zhuang, Yingxue Zhang, Jianye Hao
|
|
|
FP3: A 3D Foundation Policy for Robotic Manipulation
Rujia Yang, Geng Chen, Chuan Wen, Yang Gao
|
|
|
Generative Quality Diversity Imitation Learning for Robot Skill Acquisition
Zhenglin Wan, Xingrui Yu, David Bossens, Yueming Lyu, Qing Guo, Flint Xiaofeng Fang, Ivor Tsang
|
|
|
(Oral) Latent Action Pretraining from Videos
Seonghyeon Ye, Joel Jang, Byeongguk Jeon, Sejune Joo, Jianwei Yang, Baolin Peng, Ajay Mandlekar, Reuben Tan, Yu-Wei Chao, Bill Yuchen Lin, Lars Liden, Kimin Lee, Jianfeng Gao, Luke Zettlemoyer, Dieter Fox, Minjoon Seo
|
|
|
Learning from Massive Human Videos for Universal Humanoid Pose Control
Jiageng Mao, Siheng Zhao, Siqi Song, Tianheng Shi, Junjie Ye, Mingtong Zhang, Haoran Geng, Jitendra Malik, Vitor Guizilini, Yue Wang
|
|
|
Learning Novel Skills from Language-Generated Demonstrations
Ao-Qun Jin, Tian-Yu Xiang, Xiao-Hu Zhou, Mei-Jiang Gui, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Yue Cao, Sheng-Bin Duan, Fu-Chao Xie, Zeng-Guang Hou
|
|
|
Modality-Composable Diffusion Policy via Inference-Time Distribution-level Composition
Jiahang Cao, Qiang Zhang, Hanzhong Guo, Jiaxu Wang, Hao Cheng, Renjing Xu
|
|
|
Offline Learning of Controllable Diverse Behaviors
Mathieu Petitbois, Rémy Portelas, Sylvain Lamprier, Ludovic Denoyer
|
|
|
Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control
Devdhar Patel, Hava Siegelmann
|
|
|
PEAR: Primitive Enabled Adaptive Relabeling for Boosting Hierarchical Reinforcement Learning
Utsav Singh, Vinay P Namboodiri
|
|
|
(Oral) Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
Max Sobol Mark, Tian Gao, Georgia Sampaio, Mohan Kumar, Archit Sharma, Chelsea Finn, Aviral Kumar
|
|
|
Responsive Noise-Relaying Diffusion Policy: Responsive and Efficient Visuomotor Control
Zhuoqun Chen, Xiu Yuan, Tongzhou Mu, Hao Su
|
|
|
RL Zero: Zero-Shot Language to Behaviors Without Any Supervision
Harshit Sikchi, Siddhant Agarwal, Pranaya Jajoo, Samyak Parajuli, Caleb Chuck, Max Rudolph, Peter Stone, Amy Zhang, Scott Niekum
|
|
|
SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
Haoquan Fang, Markus Grotz, Wilbert Pumacay, Yi Ru Wang, Dieter Fox, Ranjay Krishna, Jiafei Duan
|
|
|
Sampling from Energy-based Policies using Diffusion
Vineet Jain, Tara Akhound-Sadegh, Siamak Ravanbakhsh
|
|
|
Solving New Tasks by Adapting Internet Video Knowledge
Calvin Luo, Zilai Zeng, Yilun Du, Chen Sun
|
|
|
Stem-OB: Generalizable Visual Imitation Learning with Stem-Like Convergent Observation through Diffusion Inversion
Kaizhe Hu, Zihang Rui, Yao He, Yuyao Liu, Pu Hua, Huazhe Xu
|
|
|
(Oral) TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies
Ruijie Zheng, Yongyuan Liang, Shuaiyi Huang, Jianfeng Gao, Hal Daumé III, Andrey Kolobov, Furong Huang, Jianwei Yang
|
|
|
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks
Shiduo Zhang, Zhe Xue, Peiju Liu, Xiaopeng Yu, Yuan Li, Qinghui Gao, Zhaoye Fei, Zhangyue Yin, Zuxuan Wu, Yugang Jiang, Xipeng Qiu
|
Organizers
Student Organizers