| CARVIEW |
VLADR:
Vision and Language for Autonomous
Driving and Robotics
CVPR 2024 Workshop, Seattle WA, USA
Jun 18th (Tuesday), 2024
Jun 18th (Tuesday), 2024
Introduction
The contemporary discourse in technological advancement underscores the increasingly intertwined roles of vision and language processing, especially within the realms of autonomous driving and robotics. The necessity for this symbiosis is apparent when considering the multifaceted dynamics of real-world environments. An autonomous vehicle, for instance, operating within the framework of urban locales should not merely rely on its visual sensors for pedestrian detection, but must also interpret and act upon auditory signals, like vocalized warnings. Similarly, robots that integrate visual data with linguistic context promise more adaptive functionalities, particularly in diverse settings. This workshop is expected to spotlight the intricate arena of data-centric autonomous driving, emphasizing vision-based techniques. Central to our discussions will be topics like vision and language for autonomous driving, language-driven perception, and simulation. We will delve into the nuanced realms of vision and language representation learning and explore the future of multimodal motion prediction and planning in robotics. Recognizing the rapid expansion of this field, the introduction of new datasets and metrics for multimodal learning will also be on our agenda. Equally paramount are the discussions on privacy concerns associated with multimodal data. Moreover, our emphasis will firmly rest on safety, ensuring that systems are adept at correctly interpreting and acting on both visual and linguistic inputs, thereby preventing potential mishaps in real-world scenarios. Through a comprehensive examination of these topics, this workshop seeks to foster a deeper academic understanding of the intersection between vision and language in autonomous systems. By convening experts from interdisciplinary fields, our objective is to decipher current state-of-the-art methodologies, address challenges, and chart avenues for future endeavors, ensuring our findings resonate within both academic and industrial communities.
Call for Papers
The CVPR 2024 Vision and Language for Autonomous Driving and Robotics Workshop (https://vision-language-adr.github.io) is expected to center around data-centric autonomous driving, with a particular focus on vision-based methods.
This workshop is intended to:
- Explore potential areas in robotics that vision and language could help
- Encourage the communication and collaboration of vision and language for autonomous agents
- Provide an opportunity for CVPR community to discuss this exciting and growing area of multimodal representations
We welcome paper submissions on all topics related to neural fields for autonomous driving and robotics, including but not limited to:
- Vision and language for autonomous driving
- Language-driven perception
- Language-driven sensor and traffic simulation
- Vision and language representation learning
- Multimodal motion prediction and planning for robotics
- New datasets and metrics for multimodal learning
- Safety: Ensuring that systems can correctly interpret and act upon visual and linguistic inputs in real-world situations to prevent accidents
- Language agents for robotics
- Language-based scene understanding for driving scenarios
- Multi-modal fusion for end-to-end autonomous driving
- Large-Language-Models (LLMs) as task planner
- Other applications of LLMs to driving and robotics
Style and Author Instructions
- Paper Length: We ask authors to use the official CVPR2024 template and limit submissions to 4-8 pages excluding references.
- Dual Submissions: The workshop is non-archival. In addition, in light of the new single-track policy of CVPR 2024, we strongly encourage papers accepted to CVPR 2024 to present at our workshop.
- Presentation Forms: All accepted papers will get poster presentations during the workshop; selected papers will get oral presentations.
All submissions should anonymized. Papers with more than 4 pages (excluding references) will be reviewed as long papers, and papers with more than 8 pages (excluding references) will be rejected without review. Supplementary material is optional with supported formats: pdf, mp4 and zip. All papers that were not previously presented in a major conference, will be peer-reviewed by three experts in the field in a double-blind manner. In case you are submitting a previously accepted conference paper, please also attach a copy of the acceptance notification email in the supplementary material documents.
All submissions should adhere to the CVPR 2024 author guidelines.
Contact: If you have any questions, please contact vladr@googlegroups.com.
Submission Portal: https://openreview.net/group?id=thecvf.com/CVPR/2024/Workshop/VLADR
Paper Review Timeline:
| Paper Submission and supplemental material deadline | March 29th, 2024 (PST) |
|---|---|
| Notification to authors | |
| Camera ready deadline |
Invited Speakers
Jitendra Malik
Professor at UC Berkeley
Trevor Darrell
Professor at UC Berkeley
Chelsea Finn
Assistant Professor at Stanford University
Fei Xia
Research Scientist at Google Deepmind Robotics
Fei Xia is a Research Scientist at Google Research where he works on the Robotics team. He received his PhD degree from the Department of Electrical Engineering, Stanford University. He was co-advised by Silvio Savarese in SVL and Leonidas Guibas. His mission is to build intelligent embodied agents that can interact with complex and unstructured real-world environments, with applications to home robotics. He has been approaching this problem from 3 aspects: 1) Large-scale and transferrable simulation for Robotics. 2) Learning algorithms for long-horizon tasks. 3) Combining geometric and semantic representation for environments. Most recently, He has been exploring using foundation models for robot decision-making.
Long Chen
Staff Scientist at Wayve
Long Chen is a Staff Scientist at Wayve, focusing on building Vision Language Action Models (VLAM) for the next wave of autonomous driving, including groundbreaking work on Driving with LLMs and LINGO. Previously, he was a research engineer at Lyft Level 5, where he led the development of data-driven planning models using crowd-sourced data for Lyft's self-driving cars. Long received a PhD from Bournemouth University and a master’s degree from University College London, where his research focused on applying AI in various domains such as mixed reality, surgical robotics, and healthcare.
Tentative Schedule
| Opening remarks and welcome | 08:55 AM - 09:00 AM |
| Chelsea Finn | 09:00 AM - 09:45 AM |
| Trevor Darrell | 10:00 AM - 10:45 AM |
| Jitendra Malik | 11:05 AM - 11:50 AM |
| Poster Session & Lunch | 12:00 AM - 02:00 PM |
| Fei Xia | 02:00 PM - 02:45 PM |
| Long Chen | 03:00 PM - 03:45 PM |
| Oral Session | 04:00 PM - 05:00 PM |
Accepted Papers
[Oral] RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation
Hanxiao Jiang, Binghao Huang, Ruihai Wu, Zhuoran Li, Shubham Garg, Hooshang Nayyeri, Shenlong Wang, Yunzhu Li
[OpenReview]On the Safety Concerns of Deploying LLMs/VLMs in Robotics: Highlighting the Risks and Vulnerabilities
Xiyang Wu, Ruiqi Xian, Tianrui Guan, Jing Liang, Souradip Chakraborty, Fuxiao Liu, Brian M. Sadler, Dinesh Manocha, Amrit Bedi
[OpenReview]Optimizing Visual Question Answering Models for Driving: Bridging the Gap Between Human and Machine Attention Patterns
Kaavya Rekanar, Martin Hayes, Ganesh Sistu, Ciaran Eising
[OpenReview][Oral] Collision Avoidance Metric for 3D Camera Evaluation
Vage Taamazyan, Alberto Dall'Olio, Agastya Kalra
[OpenReview]Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving
Akshay Gopalkrishnan, Ross Greer, Mohan Trivedi
[OpenReview][Oral] DriveLM: Driving with Graph Visual Question Answering
Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Hanxue Zhang, Chengen Xie, Jens Beißwenger, Ping Luo, Andreas Geiger, Hongyang Li
[OpenReview][Oral] AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving
Mingfu Liang, Jong-Chyi Su, Samuel Schulter, Sparsh Garg, Shiyu Zhao, Ying Wu, Manmohan Chandraker
[OpenReview]Ambiguous Annotations: When is a Pedestrian not a Pedestrian?
Luisa Schwirten, Jannes Scholz, Daniel Kondermann, Janis Keuper
[OpenReview]Envisioning the Unseen: Revolutionizing Indoor Spaces with Deep Learning-Enhanced 3D Semantic Segmentation
Muhammad Arif
[OpenReview]Explanation for Trajectory Planning using Multi-modal Large Language Model for Autonomous Drivingn
Muhammad Arif
[OpenReview]Safedrive Dreamer: Navigating Safety-Critical Scenarios in the Real-world with World Models
Bangan Wang, Haitao Li, Tianyu Shi
[OpenReview]Improving End-To-End Autonomous Driving with Synthetic Data from Latent Diffusion Models
Harsh Goel, Sai Shankar Narasimhan
[OpenReview]ATLAS: Adaptive Landmark Acquisition using LLM-Guided Navigation
Utteja Kallakuri, Bharat Prakash, Arnab Neelim Mazumder, Hasib-Al Rashid, Nicholas R Waytowich, Tinoosh Mohsenin
[OpenReview]DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences
Yidong Huang, Jacob Sansom, Ziqiao Ma, Felix Gervits, Joyce Chai
[OpenReview]Driver Activity Classification Using Generalizable Representations from Vision-Language Models
Ross Greer, Mathias Viborg Andersen, Andreas Møgelmose, Mohan Trivedi
[OpenReview]Language-Driven Active Learning for Diverse Open-Set 3D Object Detection
Ross Greer, Bjørk Antoniussen, Andreas Møgelmose, Mohan Trived
[OpenReview]Evolutionary Reward Design and Optimization with Multimodal Large Language Models
Ali Emre Narin
[OpenReview][Oral] Open6DOR: Benchmarking Open-instruction 6-DoF Object Rearrangement and A VLM-based Approach
RYufei Ding, Haoran Geng, Chaoyi Xu, Xiaomeng Fang, Jiazhao Zhang, Songlin Wei, Qiyu Dai, Zhizheng Zhang, He Wang
[OpenReview]