I am building spatial intelligence and reactive dexterity for robots to enable them to truly understand their surroundings and interact with purpose.
At LG Electronics, I lead the Physical Intelligence team, shaping the research roadmap and partnerships to develop Embodied AI technologies for future LG products. We are bringing together a group of AI and robotics researchers passionate about deploying robots in everyday life. Leveraging the vast ecosystem of connected devices and access to data from diverse real-world settings, we focus on training foundation models for autonomy, seamless integration, and trusted operation.
I completed my PhD at MIT and Master’s at Carnegie Mellon. My research focused on dexterous manipulation and multimodal policies for robot control, and has received the Best Student Paper Award at RSS, the Amazon Robotics Best Systems Paper Award in Manipulation, and a Best Video Award nomination at ICRA.
Existing robotic systems have a clear tension between generality and precision. Deployed solutions for robotic manipulation tend to fall into the paradigm of one robot solving a single task, lacking precise generalization, i.e., the ability to solve many tasks without compromising on precision. This paper explores solutions for precise and general pick-and-place. In precise pick-and-place, i.e. kitting, the robot transforms an unstructured arrangement of objects into an organized arrangement, which can facilitate further manipulation. We propose simPLE (simulation to Pick Localize and PLacE) as a solution to precise pick-and-place. simPLE learns to pick, regrasp and place objects precisely, given only the object CAD model and no prior experience. We develop three main components: task-aware grasping, visuotactile perception, and regrasp planning. Task-aware grasping computes affordances of grasps that are stable, observable, and favorable to placing. The visuotactile perception model relies on matching real observations against a set of simulated ones through supervised learning. Finally, we compute the desired robot motion by solving a shortest path problem on a graph of hand-to-hand regrasps. On a dual-arm robot equipped with visuotactile sensing, we demonstrate pick-and-place of 15 diverse objects with simPLE. The objects span a wide range of shapes and simPLE achieves successful placements into structured arrangements with 1mm clearance over 90% of the time for 6 objects, and over 80% of the time for 11 objects.
VioLA: Aligning Videos to 2D LiDAR Scans
Jun-Jee Chao, Selim Engin, Nikhil Chavan-Dafle, and 2 more authors
We study the problem of aligning a video that captures a local portion of an environment to the 2D LiDAR scan of the entire environment. We introduce a method (VioLA) that starts with building a semantic map of the local scene from the image sequence, then extracts points at a fixed height for registering to the LiDAR map. Due to reconstruction errors or partial coverage of the camera scan, the reconstructed semantic map may not contain sufficient information for registration. To address this problem, VioLA makes use of a pre-trained text-to-image inpainting model paired with a depth completion model for filling in the missing scene content in a geometrically consistent fashion to support pose registration. We evaluate VioLA on two real-world RGB-D benchmarks, as well as a self-captured dataset of a large office scene. Notably, our proposed scene completion module improves the pose registration performance by up to 20%.
Real-time Simultaneous Multi-Object 3D Shape Reconstruction, 6DoF Pose Estimation and Dense Grasp Prediction
Shubham Agrawal, Nikhil Chavan-Dafle, Isaac Kasahara, and 3 more authors
Robotic manipulation systems operating in complex environments rely on perception systems that provide information about the geometry (pose and 3D shape) of the objects in the scene along with other semantic information such as object labels. This information is then used for choosing the feasible grasps on relevant objects. In this paper, we present a novel method to provide this geometric and semantic information of all objects in the scene as well as feasible grasps on those objects simultaneously. The main advantage of our method is its speed as it avoids sequential perception and grasp planning steps. With detailed quantitative analysis, we show that our method delivers competitive performance compared to the state-of-the-art dedicated methods for object shape, pose, and grasp predictions while providing fast inference at 30 frames per second speed.
Simultaneous Object Reconstruction and Grasp Prediction using a Camera-centric Object Shell Representation
Nikhil Chavan-Dafle, Sergiy Popovych, Shubham Agrawal, and 2 more authors
Being able to grasp objects is a fundamental component of most robotic manipulation systems. In this paper, we present a new approach to simultaneously reconstruct a mesh and a dense grasp quality map of an object from a depth image. At the core of our approach is a novel camera-centric object representation called the "object shell" which is composed of an observed "entry image" and a predicted "exit image". We present an image-to-image residual ConvNet architecture in which the object shell and a grasp-quality map are predicted as separate output channels. The main advantage of the shell representation and the corresponding neural network architecture, ShellGrasp-Net, is that the input-output pixel correspondences in the shell representation are explicitly represented in the architecture. We show that this coupling yields superior generalization capabilities for object reconstruction and accurate grasp quality estimation implicitly considering the object geometry. Our approach yields an efficient dense grasp quality map and an object geometry estimate in a single forward pass. Both of these outputs can be used in a wide range of robotic manipulation applications. With rigorous experimental validation, both in simulation and on a real setup, we show that our shell-based method can be used to generate precise grasps and the associated grasp quality with over 90% accuracy. Diverse grasps computed on shell reconstructions allow the robot to select and execute grasps in cluttered scenes with more than 93% success rate.
Planar In-hand Manipulation via Motion Cones
Nikhil Chavan-Dafle, Rachel Holladay, and Alberto Rodriguez
IJRR [Invited Paper | RSS 2018 Best Student Paper Award], 2020
In this article, we present the mechanics and algorithms to compute the set of feasible motions of an object pushed in a plane. This set is known as the motion cone and was previously described for non-prehensile manipulation tasks in the horizontal plane. We generalize its construction to a broader set of planar tasks, such as those where external forces including gravity influence the dynamics of pushing, or prehensile tasks, where there are complex frictional interactions between the gripper, object, and pusher. We show that the motion cone is defined by a set of low-curvature surfaces and approximate it by a polyhedral cone. We verify its validity with thousands of pushing experiments recorded with a motion tracking system. Motion cones abstract the algebra involved in the dynamics of frictional pushing and can be used for simulation, planning, and control. In this article, we demonstrate their use for the dynamic propagation step in a sampling-based planning algorithm. By constraining the planner to explore only through the interior of motion cones, we obtain manipulation strategies that are robust against bounded uncertainties in the frictional parameters of the system. Our planner generates in-hand manipulation trajectories that involve sequences of continuous pushes, from different sides of the object when necessary, with 5–1,000 times speed improvements to equivalent algorithms.
Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching
Andy Zeng, Shuran Song, Kuan-Ting Yu, and 18 more authors
IJRR [Amazon Robotics Best Systems Paper Award in Manipulation], 2019
This article presents a robotic pick-and-place system that is capable of grasping and recognizing both known and novel objects in cluttered environments. The key new feature of the system is that it handles a wide range of object categories without needing any task-specific training data for novel objects. To achieve this, it first uses an object-agnostic grasping framework to map from visual observations to actions: inferring dense pixel-wise probability maps of the affordances for four different grasping primitive actions. It then executes the action with the highest affordance and recognizes picked objects with a cross-domain image classification framework that matches observed images to product images. Since product images are readily available for a wide range of objects (e.g., from the web), the system works out-of-the-box for novel objects without requiring any additional data collection or re-training. Exhaustive experimental results demonstrate that our multi-affordance grasping achieves high success rates for a wide variety of objects in clutter, and our recognition algorithm achieves high accuracy for both known and novel grasped objects. The approach was part of the MIT–Princeton Team system that took first place in the stowing task at the 2017 Amazon Robotics Challenge.
Extrinsic Dexterity: In-hand Manipulation with External Forces
Nikhil Chavan-Dafle, Alberto Rodriguez, Robert Paolini, and 7 more authors
In-hand manipulation is the ability to reposition an object in the hand, for example when adjusting the grasp of a hammer before hammering a nail. The common approach to in-hand manipulation with robotic hands, known as dexterous manipulation [1], is to hold an object within the fingertips of the hand and wiggle the fingers, or walk them along the object’s surface. Dexterous manipulation, however, is just one of the many techniques available to the robot. The robot can also roll the object in the hand by using gravity, or adjust the object’s pose by pressing it against a surface, or if fast enough, it can even toss the object in the air and catch it in a different pose. All these techniques have one thing in common: they rely on resources extrinsic to the hand, either gravity, external contacts or dynamic arm motions. We refer to them as “extrinsic dexterity”. In this paper we study extrinsic dexterity in the context of regrasp operations, for example when switching from a power to a precision grasp, and we demonstrate that even simple grippers are capable of ample in-hand manipulation. We develop twelve regrasp actions, all open-loop and hand-scripted, and evaluate their effectiveness with over 1200 trials of regrasps and sequences of regrasps, for three different objects (see video [2]). The long-term goal of this work is to develop a general repertoire of these behaviors, and to understand how such a repertoire might eventually constitute a general-purpose in-hand manipulation capability.