| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Mon, 10 Nov 2025 04:45:58 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"69116e06-17d88"
expires: Sun, 28 Dec 2025 17:34:52 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 644C:2C10E1:7DDC22:8D18CC:695167DF
accept-ranges: bytes
age: 0
date: Sun, 28 Dec 2025 17:24:52 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210026-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766942692.228706,VS0,VE222
vary: Accept-Encoding
x-fastly-request-id: 3c8e9db1f0d408db66e2fbf9773d83541ea3e68f
content-length: 22455
Yufei Wang
photo credit to: Liying Qiu
photo credit to: Liying Qiu
I am a final-year Phd student at Robotics Institute,
Carnegie Mellon University.
I am fortunate to be co-advised by
Prof. Zackory Erickson
and Prof. David Held.
I received my M.S. in Computer Science from Computer Science Department, Carnegie Mellon University in Dec, 2020, advised by Prof. David Held. Before coming to CMU, I received my B.S. in Data Science from Yuanpei College, Peking University in July 2019, advised by Prof. Bin Dong.
My general research interest is robot learning for manipulation, and its application in healthcare robotics. My graduate study is supported by the Uber Presidential Fellowship (2022 - 2023) and the SoftBank & Arm Fellowship (2025 - 2026).
I am currently on the job market for academic and industrial research positions starting Fall 2026. If you find my research background a good fit, please feel free to reach out to me.
yufeiw2@andrew.cmu.edu | Google Scholar | Github | Twitter | CV
I received my M.S. in Computer Science from Computer Science Department, Carnegie Mellon University in Dec, 2020, advised by Prof. David Held. Before coming to CMU, I received my B.S. in Data Science from Yuanpei College, Peking University in July 2019, advised by Prof. Bin Dong.
My general research interest is robot learning for manipulation, and its application in healthcare robotics. My graduate study is supported by the Uber Presidential Fellowship (2022 - 2023) and the SoftBank & Arm Fellowship (2025 - 2026).
I am currently on the job market for academic and industrial research positions starting Fall 2026. If you find my research background a good fit, please feel free to reach out to me.
yufeiw2@andrew.cmu.edu | Google Scholar | Github | Twitter | CV
News
08/2025: Three papers on Robot Assisted Dressing with Arm Motions, Geometric Red Teaming for Robotic Manipulation Policies, and Articulated Asset Generation have been accepted to CoRL 2025!
06/2025: Offline RLVLMF is accepted to IROS 2025!
04/2025: ArticuBot is accepted to RSS 2025!
09/2024: DiffTORI is accepted to NeurIPS 2024 as a spotlight!
08/2024: Our review paper on deformable object manipulation: Unfolding the Literature is accepted to Annual Review of Control, Robotics, and Autonomous Systems.
02/2024: Force Constrained Visual Policy is accepted to RA-L 2024.
08/2023: Did a wonderful summer intern at MIT-IBM Watson AI Lab, Hosted by Dr. Chuang Gan
04/2023: One Policy to Dress Them All is accepted to RSS 2023
09/2022: ToolFlowNet is accepted to CoRL 2022
07/2022: Visual Haptic Reasoning paper accepted to RA-L with presentation at IROS 2022!
Publications
(*†‡ indicates equal contribution/advising)
Force-Modulated Visual Policy for Robot-Assisted Dressing with Arm Motions
Project Page / Abstract / Bibtex
Alexis Yihong Hao, Yufei Wang, Navin Sriram Ravie, Bharath Hegde, David Held†, Zackory Erickson†
Conference on Robot Learning (CoRL), 2025
Project Page / Abstract / Bibtex
Robot-assisted dressing has the potential to significantly improve the lives of individuals with mobility impairments. To ensure an effective and comfortable dressing experience, the robot must be able to handle challenging deformable garments, apply appropriate forces, and adapt to limb movements throughout the dressing process. Prior work often makes simplifying assumptions—such as static human limbs during dressing—which limits real-world applicability. In this work, we develop a robot-assisted dressing system capable of handling partial observations with visual occlusions, as well as robustly adapting to arm motions during the dressing process. Given a policy trained in simulation with partial observations, we propose a method to fine-tune it in the real world using a small amount of data and multi-modal feedback from vision and force sensing, to further improve the policy’s adaptability to arm motions and enhance safety. We evaluate our method in simulation with simplified articulated human meshes and in a real world human study with 12 participants across 264 dressing trials. Our policy successfully dresses two long-sleeve everyday garments onto the participants while being adaptive to various kinds of arm motions, and greatly outperforms prior baselines in terms of task completion and user feedback.
@inproceedings{Hao2025Force,
title={Force-Modulated Visual Policy for Robot-Assisted Dressing with Arm Motions},
author={Hao, Yihong and Wang, Yufei and Ravie, Navin Sriram and Hegde, Bharath and Held, David and Erickson, Zackory},
booktitle={Conference on Robot Learning (CoRL)},
year={2025}}
Geometric Red-Teaming for Robotic Manipulation
Project Page / Abstract / Bibtex
Divyam Goel, Yufei Wang, Tiancheng Wu, Helen Qiao, Pavel Piliptchak, David Held†, Zackory Erickson†
Conference on Robot Learning (CoRL), 2025, Oral Presentation
Project Page / Abstract / Bibtex
Standard evaluation protocols in robotic manipulation typically assess policy performance over curated, in-distribution test sets, offering limited insight into how systems fail under plausible variation. We introduce a red-teaming framework that probes robustness through object-centric geometric perturbations, automatically generating CrashShapes---structurally valid, user-constrained mesh deformations that trigger catastrophic failures in pre-trained manipulation policies. The method integrates a Jacobian field–based deformation model with a gradient-free, simulator-in-the-loop optimization strategy. Across insertion, articulation, and grasping tasks, our approach consistently discovers deformations that collapse policy performance, revealing brittle failure modes missed by static benchmarks. By combining task-level policy rollouts with constraint-aware shape exploration, we aim to build a general purpose framework for structured, object-centric robustness evaluation in robotic manipulation. We additionally show that fine-tuning on individual CrashShapes, a process we refer to as blue-teaming, improves task success by up to 60 percentage points on those shapes, while preserving performance on the original object, demonstrating the utility of red-teamed geometries for targeted policy refinement. Finally, we validate both red-teaming and blue-teaming results with a real robotic arm, observing that simulated CrashShapes reduce task success from 90% to as low as 22.5%, and that blue-teaming recovers performance to up to 90% on the corresponding real-world geometry---closely matching simulation outcomes.
@inproceedings{Goel2025Geometric,
title={Geometric Red-Teaming for Robotic Manipulation},
author={Goel, Divyam and Wang, Yufei and Wu, Tiancheng and Qiao, Helen and Piliptchak, Pavel and Held, David and Erickson, Zackory},
booktitle={Conference on Robot Learning (CoRL)},
year={2025}}
Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling
Project Page / Abstract / Bibtex
Xiaowen Qiu, Jincheng Yang, Yian Wang, Zhehuan Chen, Yufei Wang, Tsun-Hsuan Wang, Zhou Xian, Chuang Gan
Conference on Robot Learning (CoRL), 2025
Project Page / Abstract / Bibtex
3D articulated objects modeling has long been a challenging problem, since it requires to capture both accurate surface geometries and semantically meaningful and spatially precise structures, parts, and joints. Existing methods heavily depend on training data from a limited set of handcrafted articulated object categories (e.g., cabinets and drawers), which restricts their ability to model a wide range of articulated objects in an open-vocabulary context. To address these limitations, we propose Articulate AnyMesh, an automated framework that is able to convert any rigid 3D mesh into its articulated counterpart in an open-vocabulary manner. Given a 3D mesh, our framework utilizes advanced Vision-Language Models and visual prompting techniques to extract semantic information, allowing for both the segmentation of object parts and the construction of functional joints. Our experiments show that Articulate AnyMesh can generate large-scale, high-quality 3D articulated objects, including tools, toys, mechanical devices, and vehicles, significantly expanding the coverage of existing 3D articulated object datasets. Additionally, we show that these generated assets can facilitate the acquisition of new articulated object manipulation skills in simulation, which can then be transferred to a real robotic system.
@inproceedings{Qiu2025Articulate,
title={Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling},
author={Qiu, Xiaowen and Yang, Jincheng and Wang, Yian and Chen, Zhehuan and Wang, Yufei and Wang, Tsun-Hsuan and Zhou, Xian, and Gan, Chuang},
booktitle={Conference on Robot Learning (CoRL)},
year={2025}}
Real-World Offline Reinforcement Learning from Vision Language Model Feedback
Paper / Abstract / Bibtex
Sreyas Venkataraman*, Yufei Wang*, Ziyu Wang, Navin Sriram Ravie, Zackory Erickson†, David Held†
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025
Paper / Abstract / Bibtex
Offline reinforcement learning can enable policy learning from pre-collected, sub-optimal datasets without online interactions. This makes it ideal for real-world robots and safety-critical scenarios, where collecting online data or expert demonstrations is slow, costly, and risky. However, most existing offline RL works assume the dataset is already labeled with the task rewards, a process that often requires significant human effort, especially when ground-truth states are hard to ascertain (e.g., in the real-world). In this paper, we build on prior work, specifically RL-VLM-F, and propose a novel system that automatically generates reward labels for offline datasets using preference feedback from a vision-language model and a text description of the task. Our method then learns a policy using offline RL with the reward-labeled dataset. We demonstrate the system's applicability to a complex real-world robot-assisted dressing task, where we first learn a reward function using a vision-language model on a sub-optimal offline dataset, and then we use the learned reward to employ Implicit Q learning to develop an effective dressing policy. Our method also performs well in simulation tasks involving the manipulation of rigid and deformable objects, and significantly outperform baselines such as behavior cloning and inverse RL. In summary, we propose a new system that enables automatic reward labeling and policy learning from unlabeled, sub-optimal offline datasets.
@inproceedings{Venkataraman2025real,
title={Real-World Offline Reinforcement Learning from Vision Language Model Feedback },
author={Venkataraman, Sreyas and Wang, Yufei and Wang, Ziyu and Ravie, Navin Sriram and Erickson, Zackory and Held, David},
booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2025}}
ArticuBot: Learning Universal Articulated Object Manipulation Policy via Large Scale Simulation
Paper / Project Page / Code / Abstract / Bibtex
Yufei Wang*, Ziyu Wang*, Mino Nakura†, Pratik Bhowal†, Chia-Liang Kuo†, Yi-Ting Chen, Zackory Erickson‡, David Held‡
Robotics: Science and Systems (RSS) 2025
Paper / Project Page / Code / Abstract / Bibtex
This paper presents ArticuBot, in which a single learned policy enables a robotics system to open diverse categories of unseen articulated objects in the real world. This task has long been challenging for robotics due to the large variations in the geometry, size, and articulation types of such objects. Our system, ArticuBot, consists of three parts: generating a large number of demonstrations in physics-based simulation, distilling all generated demonstrations into a point cloud-based neural policy via imitation learning, and performing zero-shot sim2real transfer to real robotics systems. Utilizing sampling-based grasping and motion planning, our demonstration generalization pipeline is fast and effective, generating a total of 42.3k demonstrations over 322 training articulated objects. For policy learning, we propose a novel hierarchical policy representation, in which the high-level policy learns the sub-goal for the end-effector, and the low-level policy learns how to move the end-effector conditioned on the predicted goal. We demonstrate that this hierarchical approach achieves much better object-level generalization compared to the non-hierarchical version. We further propose a novel weighted displacement model for the high-level policy that grounds the prediction into the existing 3D structure of the scene, outperforming alternative policy representations. We show that our learned policy can zero-shot transfer to three different real robot settings: a fixed table-top Franka arm across two different labs, and an X-Arm on a mobile base, opening multiple unseen articulated objects across two labs, real lounges, and kitchens.
@inproceedings{Wang2025articubot,
title={ArticuBot: Learning Universal Articulated Object Manipulation Policy via Large Scale Simulation},
author={Wang, Yufei and Wang, Ziyu and Nakura, Mino and Bhowal, Pratik and Kuo, Chia-Liang and Chen, Yi-Ting and Erickson, Zackory and Held, David},
booktitle={Robotics: Science and Systems (RSS)},
year={2025}}
DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning
Paper / Code / Abstract / Bibtex
Weikang Wan*, Ziyu Wang*, Yufei Wang*, Zackory Erickson, David Held
NeurIPS 2024 (Spotlight)
Paper / Code / Abstract / Bibtex
This paper introduces DiffTOP, which utilizes Differentiable Trajectory OPtimization as the policy representation to generate actions for deep reinforcement and imitation learning. Trajectory optimization is a powerful and widely used algorithm in control, parameterized by a cost and a dynamics function. The key to our approach is to leverage the recent progress in differentiable trajectory optimization, which enables computing the gradients of the loss with respect to the parameters of trajectory optimization. As a result, the cost and dynamics functions of trajectory optimization can be learned end-to-end. DiffTOP addresses the ``objective mismatch'' issue of prior model-based RL algorithms, as the dynamics model in DiffTOP is learned to directly maximize task performance by differentiating the policy gradient loss through the trajectory optimization process. We further benchmark DiffTOP for imitation learning on standard robotic manipulation task suites with high-dimensional sensory observations and compare our method to feed-forward policy classes as well as Energy-Based Models (EBM) and Diffusion. Across 15 model-based RL tasks and 35imitation learning tasks with high-dimensional image and point cloud inputs, DiffTOP outperforms prior state-of-the-art methods in both domains.
@article{wan2024difftop,
title={DiffTOP: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning},
author={Wan, Weikang and Wang, Ziyu and Wang, Yufei and Erickson, Zackory and Held, David},
journal={Advances in neural information processing systems},
year={2024}
}
Unfolding the Literature: A Review of Robotic Cloth Manipulation
Paper / Abstract / Bibtex
Alberta Longhini, Yufei Wang, Irene Garcia-Camacho, David Blanco-Mulero, Marco Moletta, Michael Welle, Guillem Alenyà, Hang Yin, Zackory Erickson, David Held, Júlia Borràs, Danica Kragic
Annual Review of Control, Robotics, and Autonomous Systems, 2024
Paper / Abstract / Bibtex
The realm of textiles spans clothing, households, healthcare, sports, and industrial applications. The deformable nature of these objects poses unique challenges that prior work on rigid objects cannot fully address. The increasing interest within the community in textile perception and manipulation has led to new methods that aim to address challenges in modeling, perception, and control, resulting in significant progress. However, this progress is often tailored to one specific textile or a subcategory of these textiles. To understand what restricts these methods and hinders current approaches from generalizing to a broader range of real-world textiles, this review provides an overview of the field, focusing specifically on how and to what extent textile variations are addressed in modeling, perception, benchmarking, and manipulation of textiles. We finally conclude by identifying key open problems and outlining grand challenges that will drive future advancements in the field.
@article{longhini2024unfolding,
title={Unfolding the literature: A review of robotic cloth manipulation},
author={Longhini, Alberta and Wang, Yufei and Garcia-Camacho, Irene and Blanco-Mulero, David and Moletta, Marco and Welle, Michael and Aleny{\`a}, Guillem and Yin, Hang and Erickson, Zackory and Held, David and others},
journal={arXiv preprint arXiv:2407.01361},
year={2024}
}
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation
Paper / Project Page / Code / Abstract / Bibtex
Yufei Wang*, Zhou Xian*, Feng Chen*, Tsun-Hsuan Wang, Yian Wang, Katerina Fragkiadaki, Zackory Erickson, David Held, Chuang Gan
ICML 2024
Paper / Project Page / Code / Abstract / Bibtex
We present RoboGen, a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation. RoboGen leverages the latest advancements in foundation and generative models. Instead of directly using or adapting these models to produce policies or low-level actions, we advocate for a generative scheme, which uses these models to automatically generate diversified tasks, scenes, and training supervisions, thereby scaling up robotic skill learning with minimal human supervision. Our approach equips a robotic agent with a self-guided propose-generate-learn cycle: the agent first proposes interesting tasks and skills to develop, and then generates corresponding simulation environments by populating pertinent objects and assets with proper spatial configurations. Afterwards, the agent decomposes the proposed high-level task into sub-tasks, selects the optimal learning approach (reinforcement learning, motion planning, or trajectory optimization), generates required training supervision, and then learns policies to acquire the proposed skill. Our work attempts to extract the extensive and versatile knowledge embedded in large-scale models and transfer them to the field of robotics. Our fully generative pipeline can be queried repeatedly, producing an endless stream of skill demonstrations associated with diverse tasks and environments.
@InProceedings{pmlr-v235-wang24cc,
title = {RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation},
author = {Wang, Yufei and Xian, Zhou and Chen, Feng and Wang, Tsun-Hsuan and Wang, Yian and Fragkiadaki, Katerina and Erickson, Zackory and Held, David and Gan, Chuang},
booktitle = {Proceedings of the 41st International Conference on Machine Learning},
pages = {51936--51983},
year = {2024},
editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
volume = {235},
series = {Proceedings of Machine Learning Research},
month = {21--27 Jul},
publisher = {PMLR},
}
RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback
Paper / Project Page / Code / Abstract / Bibtex
Yufei Wang*, Zhanyi Sun*, Jesse Zhang, Zhou Xian, Erdem Bıyık, David Held†, Zackory Erickson†
ICML 2024
Paper / Project Page / Code / Abstract / Bibtex
Reward engineering has long been a challenge in Reinforcement Learning research, as it often requires extensive human effort. In this paper, we propose RL-VLM-F, a method that automatically generates reward functions for agents to learn new tasks, using only a text description of the task goal and the agent’s visual observations, by leveraging feedback from vision language foundation models (VLMs). The key to our approach is to query these models to give preferences over pairs of the agent’s image observations based on the text description of the task goal, and then learn a reward function from the preference labels. We demonstrate that RL-VLM-F successfully produces effective rewards and policies across various domains — including classic control, as well as manipulation of rigid, articulated, and deformable objects — without the need for human supervision, outperforming prior methods that use large pretrained models for reward generation under the same assumptions.
@InProceedings{pmlr-v235-wang24bn,
title = {RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback},
author = {Wang, Yufei and Sun, Zhanyi and Zhang, Jesse and Xian, Zhou and Biyik, Erdem and Held, David and Erickson, Zackory},
booktitle = {Proceedings of the 41st International Conference on Machine Learning},
pages = {51484--51501},
year = {2024},
editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
volume = {235},
series = {Proceedings of Machine Learning Research},
month = {21--27 Jul},
publisher = {PMLR},
}
Force-Constrained Visual Policy: Safe Robot-Assisted Dressing via Multi-Modal Sensing
Paper / Project Page / Abstract / Bibtex
Zhanyi Sun*, Yufei Wang*, David Held†, Zackory Erickson†
Robotics and Automation Letters (RA-L) 2024
Paper / Project Page / Abstract / Bibtex
Robot-assisted dressing could profoundly enhance the quality of life of adults with physical disabilities. To achieve this, a robot can benefit from both visual and force sensing. The former enables the robot to ascertain human body pose and garment deformations, while the latter helps maintain safety and comfort during the dressing process. In this paper, we introduce a new technique that leverages both vision and force modalities for this assistive task. Our approach first trains a vision-based dressing policy using reinforcement learning in simulation with varying body sizes, poses, and types of garments. We then learn a force dynamics model for action planning to ensure safety. Due to limitations of simulating accurate force data when deformable garments interact with the human body, we learn a force dynamics model directly from real-world data. Our proposed method combines the vision-based policy,trained in simulation, with the force dynamics model, learned in the real world, by solving a constrained optimization problem to infer actions that facilitate the dressing process without applying excessive force on the person. We evaluate our system in simulation and in a real-world human study with 10 participants across 240 dressing trials, showing it greatly outperforms prior baselines.
@article{sun2024force,
title={Force-Constrained Visual Policy: Safe Robot-Assisted Dressing via Multi-Modal Sensing},
author={Sun, Zhanyi and Wang, Yufei and Held, David and Erickson, Zackory},
journal={IEEE Robotics and Automation Letters},
year={2024},
publisher={IEEE}
}
One Policy to Dress Them All: Learning to Dress People with Diverse Poses and Garments
Paper / Project Page / CMU News / Abstract / Bibtex
Yufei Wang, Zhanyi Sun, Zackory Erickson*, David Held*
RSS 2023
Paper / Project Page / CMU News / Abstract / Bibtex
Robot-assisted dressing could benefit the lives of many people such as older adults and individuals with disabilities. Despite such potential, robot-assisted dressing remains a challenging task for robotics as it involves complex manipulation of deformable cloth in 3D space. Many prior works aim to solve the robot-assisted dressing task, but they make certain assumptions such as a fixed garment and a fixed arm pose that limit their ability to generalize. In this work, we develop a robot-assisted dressing system that is able to dress different garments on people with diverse poses from partial point cloud observations, based on a learned policy. We show that with proper design of the policy architecture and Q function, reinforcement learning (RL) can be used to learn effective policies with partial point cloud observations that work well for dressing diverse garments. We further leverage policy distillation to combine multiple policies trained on different ranges of human arm poses into a single policy that works over a wide range of different arm poses. We conduct comprehensive real-world evaluations of our system with 510 dressing trials in a human study with 17 participants with different arm poses and dressed garments. Our system is able to dress 86% of the length of the participants' arms on average.
@inproceedings{Wang2023One,
title={One Policy to Dress Them All: Learning to Dress People with Diverse Poses and Garments},
author={Wang, Yufei and Sun, Zhanyi and Erickson, Zackory and Held, David},
booktitle={Robotics: Science and Systems (RSS)},
year={2023}}
Elastic Context: Encoding Elasticity for Data-driven Models of Textiles
Paper / Abstract / Bibtex
Alberta Longhini, Marco Moletta, Alfredo Reichlin, Michael C Welle, Alexander Kravberg, Yufei Wang, David Held, Zackory Erickson, Danica Kragic
ICRA 2023
Paper / Abstract / Bibtex
Physical interaction with textiles, such as assistive dressing, relies on advanced dextreous capabilities. The underlying complexity in textile behavior when being pulled and stretched, is due to both the yarn material properties and the textile construction technique. Today, there are no commonly adopted and annotated datasets on which the various interaction or property identification methods are assessed. One important property that affects the interaction is material elasticity that results from both the yarn material and construction technique: these two are intertwined and, if not known a-priori, almost impossible to identify through sensing commonly available on robotic platforms. We introduce Elastic Context (EC), a concept that integrates various properties that affect elastic behavior, to enable a more effective physical interaction with textiles. The definition of EC relies on stress/strain curves commonly used in textile engineering, which we reformulated for robotic applications. We employ EC using Graph Neural Network (GNN) to learn generalized elastic behaviors of textiles. Furthermore, we explore the effect the dimension of the EC has on accurate force modeling of non-linear real-world elastic behaviors, highlighting the challenges of current robotic setups to sense textile properties.
@inproceedings{longhini2023elastic,
title={Elastic Context: Encoding Elasticity for Data-driven Models of Textiles},
author={Longhini, Alberta and Moletta, Marco and Reichlin, Alfredo and Welle, Michael C and Kravberg, Alexander and Wang, Yufei and Held, David and Erickson, Zackory and Kragic, Danica},
journal={Internation Conference on Robot Automation (ICRA)},
year={2023}
}
ToolFlowNet: Robotic Manipulation with Tools via Predicting Tool Flow from Point Clouds
Paper / Project Page / Code / Abstract / Bibtex
Daniel Seita, Yufei Wang†, Sarthak J Shetty†, Edward Yao Li†, Zackory Erickson, David Held
CoRL 2022
Paper / Project Page / Code / Abstract / Bibtex
Point clouds are a widely available and canonical data modality which conveys the 3D geometry of a scene. Despite significant progress in classification and segmentation from point clouds, policy learning from such a modality remains challenging, and most prior works in imitation learning focus on learning policies from images or state information. In this paper, we propose a novel framework for learning policies from point clouds for robotic manipulation with tools. We use a novel neural network, ToolFlowNet, which predicts dense per-point flow on the tool that the robot controls, and then uses the flow to derive the transformation that the robot should execute. We apply this framework to imitation learning of challenging deformable object manipulation tasks with continuous movement of tools, including scooping and pouring, and demonstrate significantly improved performance over baselines which do not use flow. We perform physical scooping experiments with ToolFlowNet and find that we can attain 78% scooping success.
@inproceedings{Seita2022toolflownet,
title={ToolFlowNet: Robotic Manipulation with Tools via Predicting Tool Flow from Point Clouds},
author={Seita, Daniel and Wang, Yufei and Shetty, Sarthak, and Li Edward, and Erickson, Zackory and Held, David},
booktitle={Conference on Robot Learning},
year={2022}}
Visual Haptic Reasoning: Estimating Contact Forces by Observing Deformable Object Interactions
Yufei Wang, David Held, Zackory Erickson
Robotics and Automation Letters (RA-L) with presentation at IROS 2022
Paper / Project Page / Abstract / Bibtex
Yufei Wang, David Held, Zackory Erickson
Robotics and Automation Letters (RA-L) with presentation at IROS 2022
Paper / Project Page / Abstract / Bibtex
Robotic manipulation of highly deformable cloth presents a promising opportunity to assist people with several daily tasks, such as washing dishes; folding laundry; or dressing, bathing, and hygiene assistance for individuals with severe motor impairments. In this work, we introduce a formulation that enables a collaborative robot to perform visual haptic reasoning with cloth -- the act of inferring the location and magnitude of applied forces during physical interaction. We present two distinct model representations, trained in physics simulation, that enable haptic reasoning using only visual and robot kinematic observations. We conducted quantitative evaluations of these models in simulation for robot-assisted dressing, bathing, and dish washing tasks, and demonstrate that the trained models can generalize across different tasks with varying interactions, human body sizes, and object shapes. We also present results with a real-world mobile manipulator, which used our simulation-trained models to estimate applied contact forces while performing physically assistive tasks with cloth.
@article{wang2022visual,
title={Visual Haptic Reasoning: Estimating Contact Forces by Observing Deformable Object Interactions},
author={Wang, Yufei and Held, David and Erickson, Zackory},
journal={IEEE Robotics and Automation Letters},
volume={7},
number={4},
pages={11426--11433},
year={2022},
publisher={IEEE}
}
Learning Visible Connectivity Dynamics for Cloth Smoothing
Xingyu Lin*, Yufei Wang*, Zixuan Huang, David Held (*order by dice)
CoRL 2021
Paper / Project Page / Code / Abstract / Bibtex
Xingyu Lin*, Yufei Wang*, Zixuan Huang, David Held (*order by dice)
CoRL 2021
Paper / Project Page / Code / Abstract / Bibtex
Robotic manipulation of cloth remains challenging due to the complex dynamics of cloth, lack of a low-dimensional state representation, and self-occlusions. In contrast to previous model-based approaches that learn a pixel-based dynamics model or a compressed latent vector dynamics, we propose to learn a particle-based dynamics model from a partial point cloud observation. To overcome the challenges of partial observability, we infer which visible points are connected on the underlying cloth mesh. We then learn a dynamics model over this visible connectivity graph. Compared to previous learning-based approaches, our model poses strong inductive bias with its particle based representation for learning the underlying cloth physics; it can generalize to cloths with novel shapes; it is invariant to visual features; and the predictions can be more easily visualized. We show that our method greatly outperforms previous state-of-the-art model-based and model-free reinforcement learning methods in simulation. Furthermore, we demonstrate zero-shot sim-to-real transfer where we deploy the model trained in simulation on a Franka arm and show that the model can successfully smooth cloths of different materials, geometries and colors from crumpled configurations. Videos can be found in the supplement and on our anonymous project website: https://sites.google.com/view/vcd-cloth.
@inproceedings{lin2021VCD,
title={Learning Visible Connectivity Dynamics for Cloth Smoothing},
author={Lin, Xingyu and Wang, Yufei and Huang, Zixuan and Held, David},
booktitle={Conference on Robot Learning},
year={2021}}
FabricFlowNet: Bimanual Cloth Manipulation with a Flow-based Policy
Thomas Weng, Sujay Bajracharya, Yufei Wang, David Held
CoRL 2021
Paper / Project Page / Code / Abstract / Bibtex
Thomas Weng, Sujay Bajracharya, Yufei Wang, David Held
CoRL 2021
Paper / Project Page / Code / Abstract / Bibtex
We address the problem of goal-directed cloth manipulation, a challenging task due to the deformability of cloth. Our insight is that optical flow, a technique normally used for motion estimation in video, can also provide an effective representation for corresponding cloth poses across observation and goal images. We introduce FabricFlowNet (FFN), a cloth manipulation policy that leverages flow as both an input and as an action representation to improve performance. FabricFlowNet also elegantly switches between dual-arm and single-arm actions based on the desired goal. We show that FabricFlowNet significantly outperforms state-of-the-art model-free and model-based cloth manipulation policies. We also present real-world experiments on a bimanual system, demonstrating effective sim-to-real transfer. Finally, we show that our method generalizes when trained on a single square cloth to other cloth shapes, such as T-shirts and rectangular cloths.
@inproceedings{weng2021fabricflownet,
title={FabricFlowNet: Bimanual Cloth Manipulation
with a Flow-based Policy},
author={Weng, Thomas and Bajracharya, Sujay and
Wang, Yufei and Agrawal, Khush and Held, David},
booktitle={Conference on Robot Learning},
year={2021}
}
ROLL: Visual Self-Supervised Reinforcement Learning with Object Reasoning
Yufei Wang*, Gautham Narayan*, Xingyu Lin, Brian Okorn, David Held
CoRL 2020
Paper / Project Page / Code / Abstract / Bibtex
Yufei Wang*, Gautham Narayan*, Xingyu Lin, Brian Okorn, David Held
CoRL 2020
Paper / Project Page / Code / Abstract / Bibtex
Current image-based reinforcement learning (RL) algorithms typically operate on the whole image without performing object-level reasoning. This leads to inefficient goal sampling and ineffective reward functions. In this paper, we improve upon previous visual self-supervised RL by incorporating object-level reasoning and occlusion reasoning. Specifically, we use unknown object segmentation to ignore distractors in the scene for better reward computation and goal generation; we further enable occlusion reasoning by employing a novel auxiliary loss and training scheme. We demonstrate that our proposed algorithm, ROLL (Reinforcement learning with Object Level Learning), learns dramatically faster and achieves better final performance compared with previous methods in several simulated visual control tasks. Project video and code are available at https://sites.google.com/andrew.cmu.edu/roll.
@inproceedings{corl2020roll,
title={ROLL: Visual Self-Supervised Reinforcement Learning with Object Reasoning},
author={Wang, Yufei and Narasimhan Gautham and Lin, Xingyu and Okorn, Brian and Held, David},
booktitle={Conference on Robot Learning},
year={2020}
}
SoftGym: Benchmarking Deep Reinforcement Learning for Deformable Object Manipulation
Xingyu Lin, Yufei Wang, Jake Olkin, David Held
CoRL 2020
Paper / Project Page / Code / Abstract / Bibtex
Xingyu Lin, Yufei Wang, Jake Olkin, David Held
CoRL 2020
Paper / Project Page / Code / Abstract / Bibtex
Manipulating deformable objects has long been a challenge in robotics due to its high dimensional state representation and complex dynamics. Recent success in deep reinforcement learning provides a promising direction for learning to manipulate deformable objects with data driven methods. However, existing reinforcement learning benchmarks only cover tasks with direct state observability and simple low-dimensional dynamics or with relatively simple image-based environments, such as those with rigid objects. In this paper, we present SoftGym, a set of open-source simulated benchmarks for manipulating deformable objects, with a standard OpenAI Gym API and a Python interface for creating new environments. Our benchmark will enable reproducible research in this important area. Further, we evaluate a variety of algorithms on these tasks and highlight challenges for reinforcement learning algorithms, including dealing with a state representation that has a high intrinsic dimensionality and is partially observable. The experiments and analysis indicate the strengths and limitations of existing methods in the context of deformable object manipulation that can help point the way forward for future methods development. Code and videos of the learned policies can be found on our project website.
@inproceedings{corl2020softgym,
title={SoftGym: Benchmarking Deep Reinforcement Learning for Deformable Object Manipulation},
author={Lin, Xingyu and Wang, Yufei and Olkin, Jake and Held, David},
booktitle={Conference on Robot Learning},
year={2020}
}
f-IRL: Inverse Reinforcement Learning via State Marginal Matching
Abridged in RSS 2020, Workshop on Structured Approaches to Robot Learning for Improved Generalization
Paper / Projec Page / Code / Abstract / Bibtex
Tianwei Ni*, Harshit Sikchi*, Yufei Wang*, Tejus Gupta*, Lisa Lee†, Ben Eysenbach† (*order by dice, † equal advising)
CoRL 2020 Abridged in RSS 2020, Workshop on Structured Approaches to Robot Learning for Improved Generalization
Paper / Projec Page / Code / Abstract / Bibtex
Imitation learning is well-suited for robotic tasks where it is difficult to directly program the behavior or specify a cost for optimal control. In this work, we propose a method for learning the reward function (and the corresponding policy) to match the expert state density. Our main result is the analytic gradient of any f-divergence between the agent and expert state distribution w.r.t. reward parameters. Based on the derived gradient, we present an algorithm, f-IRL, that recovers a stationary reward function from the expert density by gradient descent. We show that f-IRL can learn behaviors from a hand-designed target state density or implicitly through expert observations. Our method outperforms adversarial imitation learning methods in terms of sample efficiency and the required number of expert trajectories on IRL benchmarks. Moreover, we show that the recovered reward function can be used to quickly solve downstream tasks, and empirically demonstrate its utility on hard-to-explore tasks and for behavior transfer across changes in dynamics.
@inproceedings{firl2020corl,
title={f-IRL: Inverse Reinforcement Learning via State Marginal Matching},
author={Ni, Tianwei and Sikchi, Harshit and Wang, Yufei and Gupta, Tejus and Lee, Lisa and Eysenbach, Ben},
booktitle={Conference on Robot Learning},
year={2020}
}
Meta-SAC: Auto-tune the Entropy Temperature of Soft Actor-Critic via Metagradient
Yufei Wang*, Tianwei Ni*
ICML 2020, Workshop on Automated Machine Learning [Link]
Paper / Video / Code / Abstract
Yufei Wang*, Tianwei Ni*
ICML 2020, Workshop on Automated Machine Learning [Link]
Paper / Video / Code / Abstract
Exploration-exploitation dilemma has long been a crucial issue in reinforcement learning. In this paper, we propose a new approach to automatically balance between these two. Our method is built upon the Soft Actor-Critic (SAC) algorithm, which uses an "entropy temperature" that balances the original task reward and the policy entropy, and hence controls the trade-off between exploitation and exploration. It is empirically shown that SAC is very sensitive to this hyperparameter, and the follow-up work (SAC-v2), which uses constrained optimization for automatic adjustment, has some limitations. The core of our method, namely Meta-SAC, is to use metagradient along with a novel meta objective to automatically tune the entropy temperature in SAC. We show that Meta-SAC achieves promising performances on several of the Mujoco benchmarking tasks, and outperforms SAC-v2 over 10% in one of the most challenging tasks, humanoid-v2.
Beyond Exponentially Discounted Sum: Automatic Learning of Return Function
Yufei Wang*, Qiwei Ye*, Tie-Yan Liu
NeurIPS 2020 Deep RL workshop
Paper / Abstract
Yufei Wang*, Qiwei Ye*, Tie-Yan Liu
NeurIPS 2020 Deep RL workshop
Paper / Abstract
In reinforcement learning, Return, which is the weighted accumulated future rewards, and Value, which is the expected return, serve as the objective that guides the learning of the policy. In classic RL, return is defined as the exponentially discounted sum of future rewards. One key insight is that there could be many feasible ways to define the form of the return function (and thus the value), from which the same optimal policy can be derived, yet these different forms might render dramatically different speeds of learning this policy. In this paper, we research how to modify the form of the return function to enhance the learning towards the optimal policy. We propose to use a general mathematical form for return function, and employ meta-learning to learn the optimal return function in an end-to-end manner. We test our methods on a specially designed maze environment and several Atari games, and our experimental results clearly indicate the advantages of automatically learning optimal return functions in reinforcement learning.
Learning to Discretize: Solving 1D Scalar Conservation Laws via Deep Reinforcement Learning
Yufei Wang*, Ziju Shen*, Zichao Long, Bin Dong
Communications in Computational Physics 2020
Paper / Code / Abstract
Yufei Wang*, Ziju Shen*, Zichao Long, Bin Dong
Communications in Computational Physics 2020
Paper / Code / Abstract
Conservation laws are considered to be fundamental laws of nature. It has broad applications in many fields, including physics, chemistry, biology, geology, and engineering. Solving the differential equations associated with conservation laws is a major branch in computational mathematics. The recent success of machine learning, especially deep learning in areas such as computer vision and natural language processing, has attracted a lot of attention from the community of computational mathematics and inspired many intriguing works in combining machine learning with traditional methods. In this paper, we are the first to view numerical PDE solvers as an MDP and to use (deep) RL to learn new solvers. As proof of concept, we focus on 1-dimensional scalar conservation laws. We deploy the machinery of deep reinforcement learning to train a policy network that can decide on how the numerical solutions should be approximated in a sequential and spatial-temporal adaptive manner. We will show that the problem of solving conservation laws can be naturally viewed as a sequential decision-making process, and the numerical schemes learned in such a way can easily enforce long-term accuracy. Furthermore, the learned policy network is carefully designed to determine a good local discrete approximation based on the current state of the solution, which essentially makes the proposed method a meta-learning approach. In other words, the proposed method is capable of learning how to discretize for a given situation mimicking human experts. Finally, we will provide details on how the policy network is trained, how well it performs compared with some state-of-the-art numerical solvers such as WENO schemes, and supervised learning based approach L3D and PINN, and how well it generalizes.
Deep Reinforcement Learning for Green Security Games with Real-Time Information
Yufei Wang, Zheyuan Ryan Shi, Lantao Yu, Yi Wu, Rohit Singh, Lucas Joppa, Fei Fang
AAAI 2019
Paper / Slides / Abstract / Bibtex
Yufei Wang, Zheyuan Ryan Shi, Lantao Yu, Yi Wu, Rohit Singh, Lucas Joppa, Fei Fang
AAAI 2019
Paper / Slides / Abstract / Bibtex
Green Security Games (GSGs) have been proposed and applied to optimize patrols conducted by law enforcement agencies in green security domains such as combating poaching, illegal logging and overfishing. However, real-time information such as footprints and agents' subsequent actions upon receiving the information, e.g., rangers following the footprints to chase the poacher, have been neglected in previous work. To fill the gap, we first propose a new game model GSG-I which augments GSGs with sequential movement and the vital element of real-time information. Second, we design a novel deep reinforcement learning-based algorithm, DeDOL, to compute a patrolling strategy that adapts to the real-time information against a best-responding attacker. DeDOL is built upon the double oracle framework and the policy-space response oracle, solving a restricted game and iteratively adding best response strategies to it through training deep Q-networks. Exploring the game structure, DeDOL uses domain-specific heuristic strategies as initial strategies and constructs several local modes for efficient and parallelized training. To our knowledge, this is the first attempt to use Deep Q-Learning for security games.
@inproceedings{wang2019deep,
title={Deep reinforcement learning for green security games with real-time information},
author={Wang, Yufei and Shi, Zheyuan Ryan and Yu, Lantao and Wu, Yi and Singh, Rohit and Joppa, Lucas and Fang, Fei},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={33},
number={01},
pages={1401--1408},
year={2019}
}