| CARVIEW |
Zhutian (Skye) Yang
/ ju tin-yen young /
oxskye {at} gmail {dot} com
Hello! My name is Zhutian (Skye). I'm a Research Scientist at Google DeepMind Robotics. I obtained PhD in robotics at MIT, working with Leslie Pack Kaelbling and Tomás Lozano-Pérez. I develop algorithms for solving long-horizon manipulation problems in geometrically complex environments, using a combination of deep learning and planning methods. I am an ex-NVIDIA intern in the Seattle Robotics Lab and ex-TRI intern in the Large Behavior Models team. I obtained bachelor's degree in information engineering and media at NTU, Singapore. |
|
News:
- [04/14/25]: Successfully defended PhD on Learning to Solve Long-Horizon Manipulation Problems!
Research
Robots exhibiting long-horizon behavior, such as unpacking grocery bags and heating up the takeouts, must be able to plan quickly and execute robustly in semantically rich, geometrically complex environments. Enabling robots to perform language-instructed, multi-step mobile manipulation tasks requires foundational models for planning and acting. Training and evaluating such models in simulation for diverse robot embodiments holds significant economic and scientific potential. Towards building fully autonomous and intelligent robot systems, I've worked on the following problems in the field of robot learning and planning:- Developing generalizable long-horizon visuomotor policies.
-
Ongoing at TRI
Developed a hierarchical multi-task policy architecture where a Robot Visual Planning Network generates language goals and guides a low-level language-conditioned multi-skill policy.
-
PoPi
Chain imitation learned policies conditioned on waypoints generated by motion planning for solving multi-step mobile manipulation of objects with unknown dynamics.
-
- Generating long-horizon manipulation trajectories by combining learning-based methods and planning techniques.
-
VLM-TAMP
Solve long-horizon manipulation problems for any robot embodiment by combining the geometric reasoning ability of TAMP and common-sense provided by pre-trained VLMs. -
Diffusion-CCSP
Learn to plan for motion while satisfacing geometric collision-free, physical stability, and data-defined spatial constraints (e.g. packing non-convex objects in boxes). -
PiGi
Learn to predict task plan feasibility from images of environments with articulated and movable obstacles using a transformer trained on a wide variety of procedurally generated scenes.
-
Publications
-
(PhD Thesis) Learning to Solve Long-Horizon Manipulation Problems
Zhutian Yang
Thesis Committee: Leslie Pack Kaelbling, Tomás Lozano-Pérez, Caelan Reed Garrett, Danfei Xu
-
Guiding Long-Horizon Task and Motion Planning with Vision Language Models
Zhutian Yang, Caelan Reed Garrett, Leslie Pack Kaelbling, Tomás Lozano-Pérez, and Dieter Fox
ICRA 2025; CoRL 2024 LangRob Workshop (Spotlight)
TLDR: Pretrained VLMs make mistakes in predicting robot actions when prompted with open language goals, so we use VLMs to break down long-horizon goals into subgoals, which are then solved by TAMP, in an interative replanning system. It's used to solve problems that involve interactions with 20+ objects and require 30-50 actions to complete.
Paper | Project Page | Code | Bibtex | Talk -
Combining Planning and Diffusion for Mobility with Unknown Dynamics
Yajvan Ravan, Zhutian Yang, Tao Chen, Leslie Pack Kaelbling, and Tomás Lozano-Pérez
In Submission
TLDR: Rearranging large objects with unprediatble dynamics is hard because the relative pose between robot and object is changing. Diffusion policies that output global robot configurations struggle to generalize to new initial and goal conditions, or new environments and objects. So, we use motion planning to generating waypoints that guide a local diffusion policy, which is trained to achieve relative movements of the chair.
Paper | Project Page | Bibtex -
Compositional Diffusion-Based Continuous Constraint Solvers
Zhutian Yang, Jiayuan Mao, Yilun Du, Jiajun Wu, Joshua Brett Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling
CoRL 2023
TLDR: Multi-step manipulation problems involve a lot of collision-free, physical stability, and culture-defined spatial constraints. Conventional methods usually solve it by sampling then rejection, which is too slow. Therefore, we find global solutions by diffusion-based optimization, using diffusion models trained for each contraint type.
Paper | Project Page | Code | Bibtex | Talk | MIT News -
Sequence-Based Plan Feasibility Prediction for Efficient Task and Motion Planning
Zhutian Yang, Caelan Reed Garrett, Leslie Pack Kaelbling, Tomás Lozano-Pérez, and Dieter Fox
RSS 2023
TLDR: In long-horizon mobile manipulation problems in complex environments with lots of articulated and movable obstacles, task and motion planners spend most computation on solving motion planning problems that aren't solvable. So we train a plan feasibility prediction model that quickly sort candidate plans by their likelihood of success using visual and language features of the problem, which cuts down planning time by 50 - 80 %.
🔥 We won Best Paper Runner-Up in CoRL 2022 Workshop on Learning, Perception, and Abstraction for Long-Horizon Planning
Paper | Project Page | Code | Bibtex | Talk | MIT News | Tech Crunch -
Let’s Handle It: Generalizable Manipulation of Articulated Objects
Zhutian Yang, and Aidan Curtis
ICRL 2022 Workshop on Generalizable Policy Learning in the Physical World (Spotlight)
🔥 We won 2nd place in the ManiSkill Challenge 2022 Robotics Track
Paper -
Zhutian Yang, Patrick Henry Winston, and David Hsu
Undergraduate thesis work; Also appeared in Advances in Cognitive Systems 2019 and DSpace@MIT
Services
- Reviewed for RA-L, IJRR, ICLR, IROS, AAAI, ICRA, CoRL, and RSS.
- Co-organized ICRA 2024 Workshop on Vision-Language Models for Navigation and Manipulation (VLMNM).
- Co-organized RSS 2023 Workshop on Learning for Task and Motion Planning (LTAMP).
- Served as a Student Counselor in EECS Resources for Easing Friction and Stress from 2020 to 2024, helping 10+ graduate students through stressful situations such as changing advisors and family conflicts.
- Served as Teaching Assistant to MIT 6.036 Introduction to Machine Learning in Spring 2022.
Misc.
- I'm an AFAA certified group exercise instructor, specializing in kickboxing.
- If thrown out of robotics research and threatened to never code again, I would do improv musical comedy.
- Wait, I could actually fight back.