I am a second-year master student (MS in Robotics) at Robotics Institute, Carnegie Mellon University, advised by Prof. Jean Oh.
Previously, I obtained my bachelor's degree from Yuanpei College 🦁, Peking University, majoring in Data Science (Computer Science + Statistics).
During my undergraduate years, I was honored to be advised by Prof. He Wang. I was also previleged to work closely with Prof. Chuang Gan.
I am always happy to chat and explore opportunities for collaboration. Feel free to reach out to me!
I am seeking PhD positions starting in Fall 2026. If you have any suggestions or opportunities, please let me know!
My research lies in the intersection of Robotics and Machine Learning, with a focus on Robot Manipulation.
My current research highlights the following perspectives:
Modeling consistent geometric and dynamic structures for generality and adaptability of interaction.
Developing robust real-world perception to capture key structures for reliability of interaction.
Building structured scene understanding from heterogeneous data to form a knowledge foundation for interaction.
Core research questions I am exploring:
How can we develop structured inductive bias from geometry, physics, causality, etc. for effective, efficient, scalable robot learning systems to enable general and adaptable robot interaction?
How can we combine general and robust learning-based design with solid engineering efforts to enable robots to operate as reliable, integrated systems in unstructured real-world environments?
TL;DR: A scalable DiT-based VLA policy with an in-context conditioning mechanism for inherent action denoising, enabling fine-grained alignment between denoised actions and raw visual tokens from historical observations.
TL;DR: A unified infrastructure with simulator-agnostic interfaces to a wide range of simulators, aiming to enable universal configuration and hybrid simulation for scalable and generalizable robot learning.
TL;DR: Robust depth perception for elementary functional structures and adaptable manipulation via online planning across potential interaction modes for articulated objects.
TL;DR: Learning shared structured alignment across heterogeneous multi-modal data to build structured graph-based scene understanding capabilities.
GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts Haoran Geng*,
Helin Xu*,
Chengyang Zhao*,
Chao Xu,
Li Yi,
Siyuan Huang,
He Wang
(* The order is determined by rolling dice.)
Conference on Computer Vision and Pattern Recognition (CVPR), 2023
(Highlight)
[paper]
[website]
[code]
[dataset]
TL;DR: Modeling shared elementary functional structures that remain geometrically consistent across various articulated object categories for generalizable perception and manipulation.
Thanks Jon Barron for this amazing template :D
Last Updated: Dec. 2025