| CARVIEW |
![]() |
Li Yi (弋力)
Tenure-track Assistant Professor at Tsinghua University
助理教授 博士生导师 清华大学交叉信息研究院(IIIS)
Email: ericyi0124 at gmail dot com
|
About
-
I am an Assistant Professor in the Institute for Interdisciplinary Information Sciences (IIIS) at Tsinghua University. I received my Ph.D. from Stanford University, advised by Professor Leonidas J. Guibas. And I spent a wonderful time at Google as a Research Scientist after graduation, working closely with Professor Thomas Funkhouser. Prior to joining Stanford, I got my bachelor's degree in Electronic Engineering from Tsinghua University.
-
My recent research interests focus on 3D perception, humanoid robot learning, and human-robot interaction, with the goal of equipping robotic agent with the ability of understanding and interacting with the 3D world.
Recruiting
-
I am actively looking for motivated visiting students, interns, PhDs, and postdocs. Please feel free to email me if you are interested.
- For PhD applicants, please contact me at least half a year prior to your application.
- For visiting students or research interns, we have openings for long-term internship (six months or longer). Both undergraduate and graduate students are welcomed. Please email me with your CV and transcript to apply.
News
- NEW [2025/09] Two papers accepted to NeurIPS 2025.
- NEW [2025/06] Four papers accepted to ICCV 2025.
- NEW [2025/06] I am co-organizing Human-Robot-Scene Interaction and Collaboration (HRSIC) Workshop at ICCV 2025.
- NEW [2025/02] Five papers accepted to CVPR 2025.
- NEW [2025/01] Two papers accepted to ICLR 2025.
- NEW [2024/12] I am invited to be a speaker in the third Workshop on Reconstruction of Human-Object Interactions (RHOBIN) at CVPR 2025.
- [2024/12] I am organizing the 1st Workshop on Humanoid Agents at CVPR 2025.
- [2024/11] I am invited to be a speaker in Learning Robot Fine and Dexterous Manipulation Workshop at CoRL 2024 (video recording).
- [2024/11] Two papers accepted to 3DV 2025.
- [2024/09] One paper accepted to NeurIPS 2024.
- [2024/07] Three papers accepted to ECCV 2024 and one paper accepted to ACMMM 2024 as oral.
- [2024/03] Four papers accepted to CVPR 2024.
- [2024/01] Two papers accepted to ICRA 2024 with one also accepted to Robotics and Automation Letters (RA-L).
- [2024/01] Two papers accepted to ICLR 2024 with one as spotlight.
- [2023/12] Two papers accepted to AAAI 2024.
- [2023/07] Four papers accepted to ICCV 2023.
- [2023/06] I will serve as an Area Chair for CVPR 2024.
- [2023/04] One paper accepted to ICML 2023 and one paper accepted to SIGGRAPH 2023.
- [2023/03] I serve as an Area Chair for NeurIPS 2023.
- [2023/03] Seven papers accepted to CVPR 2023.
- [2023/01] Two papers accepted to ICLR 2023.
- [2023/01] Two papers accepted to AAAI 2023 as orals.
- [2022/10] I serve as an Area Chair for CVPR 2023.
- [2022/09] One paper accepted to ECCV 2022 and one paper accepted to SIGGRAPH Asia 2022.
- [2022/03] Seven papers accepted to CVPR 2022.
- [2021/09] Two papers accepted to NeurIPS 2021 and one paper accepted to ICCV 2021.
- [2021/05] I serve as an Area Chair for CVPR 2022.
- [2021/05] I am organizing The 1st Workshop on Simulation Technology for Embodied AI at ICCV 2021.
- [2021/03] Two papers accepted at CVPR 2021 (one oral included).
Recent Projects
-
*: equivalent contribution, †: corresponding author
|
Deep Object-Centric 3D Perception |
|
Unleashing Humanoid Reaching Potential via Real-world-Ready Skill Space |
|
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation |
|
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge |
|
SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis |
|
4DSegStreamer: Streaming 4D Panoptic Segmentation via Dual Threads |
|
DexVLG: Dexterous Vision-Language-Grasp Model at Scale |
|
Self-Supervised Monocular 4D Scene Reconstruction for Egocentric Videos |
|
Learning Physics-Based Full-Body Human Reaching and Grasping from Brief Walking References |
|
CORE4D : A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement |
|
MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data |
|
MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining |
|
PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model |
DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References |
|
|
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation |
|
Interactive Humanoid: Online Full-Body Motion Reaction Synthesis with Social Affordance Canonicalization and Forecasting |
|
ImOV3D: Learning Open Vocabulary Point Clouds 3D Object Detection from Only 2D Images
|
|
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction |
QuasiSim: Parameterized Quasi-Physical Simulators for Dexterous Manipulations Transfer |
|
|
FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models |
|
PhysReaction: Physically Plausible Real-Time Humanoid Reaction Synthesis via Forward Dynamics Guided 4D Imitation
|
|
GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation |
|
TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding |
|
GenN2N: Generative NeRF2NeRF Translation |
|
Physics-aware Hand-object Interaction Denoising |
|
DreamLLM: Synergistic Multimodal Comprehension and Creation |
|
GeneOH Diffusion: Towards Generalizable Hand-Object Interaction Denoising via Denoising Diffusion |
|
Enhancing Generalizable 6D Pose Tracking of an In-Hand Object with Tactile Sensing |
|
CrossVideo: Self-supervised Cross-modal Contrastive Learning for Point Cloud Video Understanding |
|
Semantic Complete Scene Forecasting from a 4D Dynamic Point Cloud Sequence |
|
Full-Body Motion Reconstruction with Sparse Sensing from Graph Perspective |
|
NSM4D: Neural Scene Model Based Online 4D Point Cloud Sequence Understanding |
|
TransTouch: Learning Transparent Objects Depth Sensing Through Sparse Touches |
|
LeaF: Learning Frames for 4D Point Cloud Sequence Understanding |
|
Few-Shot Physically-Aware Articulated Mesh Generation via Hierarchical Deformation |
|
UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning |
|
3D Implicit Transporter for Temporally Consistent Keypoint Discovery |
|
ArrangementNet: Learning Scene Arrangements for Vectorized Indoor Scene Modeling |
|
Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining |
|
Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning |
|
CAMS: CAnonicalized Manipulation Spaces for Category-Level Functional Hand-Object Manipulation Synthesis |
|
JacobiNeRF: NeRF Shaping with Mutual Information Gradients |
|
GAPartNet: Learning Generalizable and Actionable Parts for Cross-Category Object Perception and Manipulation |
|
UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy |
|
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer |
|
Semi-Weakly Supervised Object Kinematic Motion Prediction |
|
Self-Supervised Category-Level Articulated Object Pose Estimation with Part-Level SE(3) Equivariance |
|
Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning? |
|
Language-Assisted 3D Feature Learning for Semantic Scene Understanding |
|
Tracking and Reconstructing Hand Object Interactions from Point Cloud Sequences in the Wild |
|
MoRig: Motion-Aware Rigging of Character Meshes from Point Clouds |
|
Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding |
|
HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction |
|
Rotationally Equivariant 3D Object Detection |
|
AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation |
|
CodedVTR: Codebook-based Sparse Voxel Transformer with Geometric Guidance |
|
Multi-Robot Active Mapping via Neural Bipartite Graph Matching |
|
APES: Articulated Part Extraction from Sprite Sheets |
|
Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction |
|
PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning |
|
Leveraging SE(3) Equivariance for Self-supervised Category-Level Object Pose Estimation from Point Clouds |
|
Contrastive Multimodal Fusion with TupleInfoNCE |
|
P4Contrast: Contrastive Learning with Pairs of Point-Pixel Pairs for RGB-D Scene Understanding |
|
Compositionally Generalizable 3D Structure Prediction |
|
Robust Neural Routing Through Space Partitions for Camera Relocalization in Dynamic Indoor Environments |
|
Complete & Label: A Domain Adaptation Approach to Semantic Segmentation of LiDAR Point Clouds |
|
Rethinking Sampling in 3D Point Cloud Generative Adversarial Networks |
|
Curriculum DeepSDF |
|
SAPIEN: A SimulAted Part-based Interactive ENvironment |
|
Category-Level Articulated Object Pose Estimation |
|
StructEdit: Learning Structural Shape Variations |
|
AdaCoSeg: Adaptive Shape Co-Segmentation with Group Consistency Loss |
|
StructureNet: Hierarchical Graph Networks for 3D Shape Generation |
|
GSPN: Generative Shape Proposal Network for 3D Instance Segmentation in Point Cloud |
|
TextureNet: Consistent Local Parametrizations for Learning from High-Resolution Signals on Meshes |
|
Supervised Fitting of Geometric Primitives to 3D Point Clouds |
|
PartNet: A Large-scale Benchmark for Fine-grained and Hierarchical Part-level 3D Object Understanding |
|
GeoNet: Deep Geodesic Networks for Point Cloud Analysis |
|
Deep Part Induction from Articulated Object Pairs |
|
Beyond Holistic Object Recognition: Enriching Image Understanding with Part States |
|
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space |
|
Learning Hierarchical Shape Segmentation and Labeling from Online Repositories |
|
SyncSpecCNN: Synchronized Spectral CNN for 3D Shape Segmentation |
|
A Scalable Active Framework for Region Annotation in 3D Shape Collections |
|
ShapeNet: An Information-Rich 3D Model Repository
|
|
3D-Assisted Image Feature Synthesis for Novel Views of an Object
|
|
Image Super-Resolution Via Analysis Sparse Prior
|





















































































