| CARVIEW |
Deep Reactive Policy
Learning Reactive Manipulator Motion Planning
for Dynamic Environments
Jiahui Yang*
Jason Jingzhou Liu* Yulong Li Youssef Khaky Kenneth Shaw Deepak Pathak
*Equal Contribution
Carnegie Mellon University
CoRL 2025
Abstract
Generating collision-free motion in dynamic, partially observable environments is a fundamental challenge for robotic manipulators. Classical motion planners can compute globally optimal trajectories but require full environment knowledge and are typically too slow for dynamic scenes. Neural motion policies offer a promising alternative by operating in closed-loop directly on raw sensory inputs but often struggle to generalize in complex or dynamic settings. We propose Deep Reactive Policy (DRP), a visuo-motor neural motion policy designed for reactive motion generation in diverse dynamic environments, operating directly on point cloud sensory input. At its core is IMPACT, a transformer-based neural motion policy pretrained on 10 million generated expert trajectories across diverse simulation scenarios. We further improve IMPACT's static obstacle avoidance through iterative student-teacher finetuning. We additionally enhance the policy's dynamic obstacle avoidance at inference time using DCP-RMP, a locally reactive goal-proposal module. We evaluate DRP on challenging tasks featuring cluttered scenes, dynamic moving obstacles, and goal obstructions. DRP achieves strong generalization, outperforming prior classical and neural methods in success rate across both simulated and real-world settings.
All videos play at 1x speed
Results Highlights
Our policy operates on point cloud observations to reach desired goal poses, with goals visualized as RGB frame axes in the videos below.
Cabinet Rearrangement
Collaborative Cooking
Fridge Rearrangement
Drawer Rearrangement
Kitchen Cleanup
Safe Human-Robot Interaction
Garbage Cleanup
Kitchen Sink
Method Overview
Deep Reactive Policy (DRP) is a visuo-motor neural motion policy designed for dynamic, real-world environments. First, the locally reactive DCP-RMP module adjusts joint goals to handle fast-moving dynamic obstacles in the local scene. Then, IMPACT, a transformer-based closed-loop motion planning policy, takes as input the scene point cloud, the modified joint goal, and the current robot joint position to output action sequences for real-time execution on the robot.
Simulation Evaluations
We evaluate DRP on over 4000 environments across 5 different categories of tasks, featuring complex static scenes and dynamic obstacles.
Static Environments
Suddenly Appearing Obstacle
Goal Blocking
Dynamic Goal Blocking
Floating Dynamic Obstacle
DRP on Static Environments
These scenarios feature challenging fixed obstacles, evaluating policies performance in predictable, unchanging settings.
Shelf
Dishwasher
Cubby
Box
Cage
Hybrid
Microwave
wallcabinet
Success Rate: DRP 84.6% | NeuralMP 50.59% | cuRobo 82.97%
DRP on Suddenly Appearing Obstacle
Obstacles appear suddenly ahead of the robot, directly blocking its path and requiring dynamic trajectory adaptation. This tests the policy's ability to react to unexpected changes in the environment.
Level 1
Level 2
Level 3
Success Rate: DRP 86% | NeuralMP 33.16% | cuRobo 59%
DRP on Goal Blocking
The goal is temporarily obstructed by an obstacle, and the robot must approach as closely as possible without colliding.
Level 1
Level 2
Success Rate: DRP 66.67% | NeuralMP 0% | cuRobo 0%
DRP on Dynamic Goal Blocking
After reaching the goal, the robot encounters a moving obstacle and must avoid it before safely returning to the goal, testing its ability to remain reactive even after task completion.
Level 1
Level 2
Level 3
Level 4
Success Rate: DRP 65.25% | NeuralMP 0.25% | cuRobo 3%
DRP on Floating Dynamic Obstacle
Obstacles move randomly throughout the environment, challenging the robot's reactivity and its ability to avoid collisions in real time.
Level 1
Level 2
Success Rate: DRP 75.5% | NeuralMP 19% | cuRobo 39.5%
Real World Evaluations
In addition to simulation, we evaluate DRP in real-world environments across the same five categories, comparing it to NeuralMP—a SOTA learning-based motion policy, and cuRobo—a SOTA optimization-based motion planner.
DRP on Static Environments
These scenarios feature challenging fixed obstacles, evaluating policies performance in predictable, unchanging settings.
Microwave
Tall Drawer
Front Cabinet
Side Cabinet
Slanted Shelf
Kitchen Shelf
Success Rate: DRP 90% | NeuralMP 30% | cuRobo-Voxels 60%
NeuralMP on Static Environments
Microwave
Tall Drawer
Front Cabinet
Side Cabinet
Slanted Shelf
Kitchen Shelf
cuRobo-Voxels on Static Environments
Microwave
Tall Drawer
Front Cabinet
Side Cabinet
Slanted Shelf
Kitchen Shelf
DRP on Suddenly Appearing Obstacle
Obstacles appear suddenly ahead of the robot, directly blocking its path and requiring dynamic trajectory adaptation. This tests the policy's ability to react to unexpected changes in the environment.
Cluttered — Large Blocker
Cluttered — Small Blocker
Tabletop — Large Blocker
Tabletop — Medium Blocker
Tabletop — Small Blocker
Success Rate: DRP 100% | NeuralMP 6.67% | cuRobo-Voxels 3.33%
Baselines on Suddenly Appearing Obstacle
NeuralMP
cuRobo-Voxels
DRP on Goal Blocking
The goal is temporarily obstructed by an obstacle, and the robot must approach as closely as possible without colliding.
Cluttered — Large Blocker
Cluttered — Small Blocker
Tabletop — Large Blocker
Tabletop — Medium Blocker
Tabletop — Small Blocker
Success Rate: DRP 92.86% | NeuralMP 0% | cuRobo-Voxels 0%
Baselines on Goal Blocking
NeuralMP
cuRobo-Voxels
DRP on Dynamic Goal Blocking
Obstacles move randomly throughout the environment, challenging the robot's reactivity and its ability to avoid collisions in real time.
Cluttered — Side Blocker
Cluttered — Front Blocker
Tabletop — Large Blocker
Tabletop — Medium Blocker
Tabletop — Small Blocker
Success Rate: DRP 93.33% | NeuralMP 0% | cuRobo-Voxels 0%
Baselines on Dynamic Goal Blocking
NeuralMP
cuRobo-Voxels
DRP on Floating Dynamic Obstacle
Obstacles move randomly throughout the environment, challenging the robot's reactivity and its ability to avoid collisions in real time. In this task, we demonstrate DRP's ability to navigate dynamic environments — a capability absent in all prior baselines. Note: During all dynamic evaluations, testers blindfold themselves to avoid seeing the scene, ensuring an unbiased performance assessment.
DRP
NeuralMP
cuRobo-Voxels
Success Rate: DRP 70% | NeuralMP 0% | cuRobo-Voxels 0%
Floating Dynamic Obstacle
Obstacles move randomly throughout the environment, challenging the robot's reactivity and its ability to avoid collisions in real time. In this task, we demonstrate DRP's ability to navigate dynamic environments — a capability absent in all prior baselines. Note: During all dynamic evaluations, testers blindfold themselves to avoid seeing the scene, ensuring an unbiased performance assessment.
DRP
NeuralMP
cuRobo-Voxels
Success Rate: DRP 70% | NeuralMP 0% | cuRobo-Voxels 0%
DRP Applications
Language Conditioned Pick-and-Place
We use GroundedDINO+SAM to extract the object's point cloud based on the user-provided prompt. A grasp generation module then proposes a grasp pose. Finally, DRP navigates to the grasp pose while safely avoiding collisions, even in the presence of dynamic obstacles.
Collision-Free Teleoperation
The user teleoperates the robot using a space mouse, with goal configurations visualized in green. DRP tracks these goals while ensuring collision-free motion, even when the goal is obstructed by obstacles. This allows the user to control the robot without concern for potential collisions.
DRP Applications
Language Conditioned Pick-and-Place
We use GroundedDINO+SAM to extract the object's point cloud based on the user-provided prompt. A grasp generation module then proposes a grasp pose. Finally, DRP navigates to the grasp pose while safely avoiding collisions, even in the presence of dynamic obstacles.
Collision-Free Teleoperation
The user teleoperates the robot using a space mouse, with goal configurations visualized in green. DRP tracks these goals while ensuring collision-free motion, even when the goal is obstructed by obstacles. This allows the user to control the robot without concern for potential collisions.
DRP Failure Cases
The obstacle geometry is signifincantly outside of DRP's training distribution, hence resulting in minor collision.
Small goal-blocking obstacles are challenging to avoid. Nevertheless, DRP attempts to slow down the robot in response.
When dynamic obstacles are large and fast-moving, DRP can have reduced collision avoiding performance.
BibTeX
@article{yang2025deep,
title={Deep Reactive Policy: Learning Reactive Manipulator Motion Planning for Dynamic Environments},
author={Jiahui Yang and Jason Jingzhou Liu and Yulong Li and Youssef Khaky and Deepak Pathak},
journal={9th Annual Conference on Robot Learning},
year={2025},
}
Acknowledgements
We thank Murtaza Dalal, Ritvik Singh, Arthur Allshire, Tal Daniel, Zheyuan Hu, Mohan Kumar Srirama, and Ruslan Salakhutdinov for their valuable discussions on this work. We are grateful to Karl Van Wyk and Nathan Ratliff for contributing ideas and implementations of Geometric Fabrics used in this project. We also thank Murtaza Dalal for his feedback on the early ideations of this paper. In addition, we thank Andrew Wang, Tony Tao, Hengkai Pan, Tiffany Tse, Sheqi Zhang, and Sungjae Park for their assistance with experiments. This work is supported in part by ONR MURI N00014-22-1-2773, ONR MURI N00014-24-1-2748, and AFOSR FA9550-23-1-0747.Website borrowed from NeRFies and UMI on Legs under a Creative Commons Attribution-ShareAlike 4.0 International License.