Highlights
| CARVIEW |
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection
CVPR 2025
Zhizheng Zhang4, Zhongyuan Wang3, Tiejun Huang2,3, Lu Sheng1✉, He Wang2,3,4✉
- Code-as-Monitor is the first framework to integrate both reactive and proactive failure detection.
- Code-as-Monitor leverages the proposed constraint elements to simplify real-time failure detection with high precision.
- Code-as-Monitor achieves state-of-the-art (SOTA) performance in both simulated and real-world environments, and exhibits strong generalizability on unseen scenarios, tasks, and objects.
Summary Video
Summary Video
Motivation
For the task "Move the pan with lobster to the stove without losing the lobster", (a) reactive failure detection identifies failures after they occur, and proactive failure detection prevents foreseeable failures. In (a), the robot detects the failure after the lobster unpredictably jumps out due to the heat. In (b), pan tilting is detected and corrected it requiring real-time precision. (c) shows that our method combined with an open-loop policy forms a closed-loop system, enabling proactive (e.g., detecting moving glass during grasping) and reactive (e.g., removing toy after grasping) failure detection in cluttered scenes.
Real-world Demos
These demos shows that Code-as-Monitor can transform an open-loop policy into a closed-loop system by integrating reactive and proactive failure detection for long-horizon tasks in cluttered environments with disturbances.
Clear all objects on table except for animals. (2X speed)
Grasp the animals according to their distances to fruits, from nearest to farthest. (1X speed)
Abstract
Automatic detection and prevention of open-set failures are crucial in closed-loop robotic systems. Recent studies often struggle to simultaneously identify unexpected failures reactively after they occur and prevent foreseeable ones proactively. To this end, we propose Code-as-Monitor (CaM), a novel paradigm leveraging the vision-language model (VLM) for both open-set reactive and proactive failure detection. The core of our method is to formulate both tasks as a unified set of spatio-temporal constraint satisfaction problems and use VLM-generated code to evaluate them for real-time monitoring. To enhance the accuracy and efficiency of monitoring, we further introduce constraint elements that abstract constraint-related entities or their parts into compact geometric elements. This approach offers greater generality, simplifies tracking, and facilitates constraint-aware visual programming by leveraging these elements as visual prompts. Experiments show that CaM achieves a 28.7% higher success rate and reduces execution time by 31.8% under severe disturbances compared to baselines across three simulators and a real-world setting. Moreover, CaM can be integrated with open-loop control policies to form closed-loop systems, enabling long-horizon tasks in cluttered scenes with dynamic environments.
Method Overview
Overview of Code-as-Monitor. This framework unifies reactive and proactive failure detection via constraints, more generally abstracts relevant entities/parts through constraint elements, and ensures precise and real-time monitoring via code evaluation.
Constraint Elements
Constraint Element Pipeline. Given a constraint, our model ConSeg generates instance-level and part-level masks across multiple views, which are projected into 3D space. Through a series of heuristics, the desired elements are produced. Once all elements are obtained, they are annotated onto the original multi-view images.
CLIPort Simulator Demos
These demos shows that Code-as-Monitor can successfully enhance monitoring of 3D spatial relationships of entities in the environment, facilitating both reactive and proactive failure detection and leading to more accurate counting.
Task: Stack in Order. Disturbances: Placement noise.
Task: Stack in Order. Disturbances: Random drop.
Task: Sweep half the blocks
Omnigibson Simulator Demos
These demos shows that Code-as-Monitor can detect richer failures (e.g., point, line, surface-level disturbances) with lower computational cost compared to frequent querying VLMs.
Visualization of Constraint-aware Segmentation
we show more visualization of constraint-aware segmentation results of both instance level and part level from out of distribution data to demonstrate the strong generalizability on unseen scenarios, tasks, and objects.
BibTeX
@article{zhou2024code,
title={Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection},
author={Zhou, Enshen and Su, Qi and Chi, Cheng and Zhang, Zhizheng and Wang, Zhongyuan and Huang, Tiejun and Sheng, Lu and Wang, He},
journal={arXiv preprint arXiv:2412.04455},
year={2024}
}