| CARVIEW |
|
|
I received a Ph.D. and M.S. from Stanford University, advised by Fei-Fei Li and Silvio Savarese, and a bachelor's degree from Tsinghua University. Before joining Cornell, I was a postdoc at UC Berkeley, working with Sergey Levine. I have also spent time at RAI Institute, Google Brain, Google X Robotics, and Microsoft Research Asia.
Prospective students: I am actively seeking motivated students and postdocs of relevant backgrounds to join our lab. If you are interested in working with me, please review the information on this page before reaching out.
Research
My research aims to enable robots to perform diverse and complex tasks in unstructured environments using deep learning. To achieve this, my lab and I develop scalable algorithms and systems for robot perception and control with the following focuses:
- Learning versatile sensorimotor skills from large-scale, diverse multimodal data.
- Autonomous data generation and collection to continually expand and refine robot capabilities.
- Boosting generalization to novel environments, tasks, and robots by incorporating prior knowledge from broad sources.
News
- [Dec 2025] Prospective students can now apply to both of Cornell CS and Robotics PhD Programs!
- [Nov 2025] We received the Amazon Research Award. Thank you Amazon!
- [Nov 2024] I am serving in the organizing committee of CoRL 2025.
- [Nov 2024] We are organizing Northeast Robotics Colloquium (NERC) 2025.
- [July 2024] I have started as an Assistant Professor at Cornell CS.
- [Dec 2023] I am serving as a Program Chair Assistant at NeurIPS 2023.
Teaching
- [Spring 2025] Instructor, CS 4756: Robot Learning, Cornell University
- [Fall 2024] Instructor, CS 6758: Deep Learning for Robotics, Cornell University
- [Winter 2021] TA, CS 231A: Computer Vision, From 3D Reconstruction to Recognition, Stanford University
- [Summer 2020] Instructor, Stanford AI4ALL Program, Stanford University
- [Winter 2018] TA, CS 231A: Computer Vision, From 3D Reconstruction to Recognition, Stanford University
Publications
(*: equal contribution, †: equal advising)
|
Correspondence-Oriented Imitation Learning: Flexible Visuomotor Control with 3D Conditioning
|
|
Planning-Guided Diffusion Policy Learning for Generalizable Contact-Rich Bimanual Manipulation
|
|
Versatile Loco-Manipulation through Flexible Interlimb Coordination
Conference on Robot Learning (CoRL) 2025 (Oral Presentation) |
|
Temporal Representation Alignment: Emergent Compositionality in Instruction Following with Successor Features
|
|
Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins
|
|
Blox-Net: Generative Design-for-Robot-Assembly using VLM Supervision, Physics Simulation, and A Robot with Reset
|
|
KALIE: Fine-Tuning Vision-Language Models for Open-World Manipulation without Robot Data
|
|
Affordance-Guided Reinforcement Learning via Visual Prompting
|
|
Should We Learn Contact-Rich Manipulation Policies from Sampling-Based Planners?
|
|
Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation
|
|
Jacta: A Versatile Planner for Learning Dexterous and Whole-Body Manipulation
|
|
MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting
|
|
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
|
|
Stabilizing Contrastive RL: Techniques for Offline Goal Reaching
|
|
Multi-Stage Cable Routing Through Hierarchical Imitation Learning
|
|
Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control
|
|
BridgeData V2: A Dataset for Robot Learning at Scale
|
|
Active Task Randomization: Learning Robust Skills Via Unsupervised Generation of Diverse and Feasible Tasks
|
|
Generalization with Lossy Affordances: Leveraging Broad Offline Data for Learning Visuomotor Tasks
Conference on Robot Learning (CoRL) 2022 (Oral Presentation) |
|
Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space
|
|
Discovering Generalizable Skills via Automated Generation of Diverse Tasks
|
|
Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations
|
|
Adaptive Procedural Task Generation for Hard-Exploration Problems
|
|
KETO: Learning Keypoint Representations for Tool Manipulation
|
|
Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation
Conference on Robot Learning (CoRL) 2019 (Oral Presentation) |
|
Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks
|
|
Learning Task-Oriented Grasping for Tool Manipulation from Simulated Self-Supervision
|
|
Multi-Task Domain Adaptation for Deep Learning of Instance Grasping from Simulation
|
|
Demo2Vec: Reasoning Object Affordances from Online Videos
|
|
Recurrent Autoregressive Networks for Online Multi-Object Tracking
|
|
DeLay: Robust Spatial Layout Estimation for Cluttered Indoor Scenes
|