| CARVIEW |
Overview
In recent years we have seen significant progress in robot learning, resulting in robot policies that are more reliable and deployable on many different scenes and tasks across robot embodiments. Such increase in robot capability necessitates a re-thinking of the robot development lifecycle of design, evaluation, and deployment. While the traditional development life cycle involves designing a method targeted at increasing the evaluation score for a handful of tasks at the researcher's own institution, a more scalable, comprehensive, and reproducible evaluation framework is needed with the increase in capability of robot policies. There is a growing need to rethink this lifecycle as a first-class problem, alongside policy design. This workshop aims to address this gap by opening discussion on:
- What are good evaluation protocols and methods for robot learning?
- How can we make robot evaluation more reproducible and scalable, and less expensive?
- How do we monitor robot status during deployment and ensure safety and performance?
- How can research on safety and evaluation outside of robotics inspire that of robotics?
Schedule
Speakers
Panelists
Accepted Presentations
- VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models
Chongkai Gao, Zixuan Liu, Zhenghao Chi, Junshan Huang, Xin Fei, Yiwen Hou, Yuxuan Zhang, Yudi Lin, Zhirui Fang, Zeyu Jiang, Lin Shao - Test-Time Scaling of Vision-Language-Action Models via Self-Certainty
Xu Luo, Jiaying Yang, Zehang Bai, Junlin Xie, Ji Zhang, Lianli Gao, Jingkuan Song - Evaluating Manipulation Policies in Clutter
Amir Rasouli, Montgomery Alban - RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation
Yi Ru Wang, Carter Ung, Grant Tannert, Jiafei Duan, Josephine Li, Amy Le, Markus Grotz, Rishabh Oswal, Wilbert Pumacay, Yuquan Deng, Ranjay Krishna, Dieter Fox, Siddhartha Srinivasa - Identity-Conditioned Preference-Aware Table Tidying with LLM-in-the-Loop
Bojun.Long, Zhenhao.Guo, Fan.Zhu - Occlusion-robust Pose Estimation for Multi-Robot Systems via Geometric-aware Diffusion Matching
Suyoung Kang, Rishav Dutta, Peng Gao, Hao Zhang - Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies
Chen Xu, Tony Khuong Nguyen, Emma Dixon, Christopher Rodriguez, Patrick Miller, Robert Lee, Paarth Shah, Rares Andrei Ambrus, Haruki Nishimura, Masha Itkina - AURA: Autonomous Upskilling with Retrieval-Augmented Agents
Alvin Zhu, Yusuke Tanaka, Andrew Goldberg, Dennis Hong - N2M: Bridging Navigation and Manipulation by Learning Pose Preference from Rollout
Kaixin Chai, Hyunjun Lee, Joseph J Lim - Reliable and Scalable Robot Policy Evaluation with Imperfect Simulators
Apurva Badithela, David Snyder, Lihan Zha, Joseph Mikhail, Matthew O'Kelly, Anushri Dixit, Anirudha Majumdar - Score the Steps, Not Just the Goal: VLM-Based Subgoal Evaluation for Robotic Manipulation
Ramy ElMallah, Krish Chhajer, Chi-Guhn Lee - Benchmarking Affordance Generalization with BusyBox
Dean Fortier, Timothy Adamson, Tess Hellebrekers, Teresa LaScala, Kofi Ennin, Michael Murray, Andrey Kolobov, Galen Mullins - SPUR: Scaling Reward Learning from Human Demonstrations
Anthony Liang, Yigit Korkmaz, Jiahui Zhang, Jesse Zhang, Abrar Anwar, Sidhant Kaushik, Yufei Wang, Yu Xiang, David Held, Dieter Fox, Abhishek Gupta, Stephen Tu, Erdem Biyik