| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Thu, 04 Dec 2025 07:48:13 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"69313cbd-7dd8"
expires: Sun, 28 Dec 2025 21:24:06 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: FCAF:1387E:7F4FE4:8EF550:69519D9C
accept-ranges: bytes
age: 0
date: Sun, 28 Dec 2025 21:14:06 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210076-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766956446.187941,VS0,VE224
vary: Accept-Encoding
x-fastly-request-id: 382c5f3d684322273e2396fb91e90bf6e623effb
content-length: 6965
Yunze Liu - Tsinghua University
About Me
I am a fourth-year Ph.D. student at the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University, advised by Prof. Li Yi (弋力).
I also work as an AI Researcher at Memories.ai, working with Prof. Shawn Shen (University of Bristol).
My research interests lie in 4D Human-Object Interaction (HOI), Embodied AI, and Multimodal Large Language Models (MLLMs). I am passionate about equipping robots with the ability to understand and interact with the dynamic 4D world.
News
- Nov 2025 Two paper accepted to WACV 2026.
- Nov 2025 My total citations surpassed 600+.
- Jun 2025 HOI4D dataset surpassed 200+ citations.
- Aug 2025 One paper accepted to ACM MM 2025.
- Jul 2025 One paper accepted to ICCV 2025.
- Jul 2025 One paper accepted to EMNLP 2025.
- May 2025 One paper accepted to UAI 2025.
- Feb 2025 Two paper accepted to CVPR 2025.
- Nov 2024 One paper accepted to 3DV 2025.
- Jul 2024 One paper accepted to ACM MM 2024 (Oral).
- Feb 2024 One paper accepted to CVPR 2024.
- Jan 2024 One paper accepted to ICRA 2024.
- Jul 2023 One paper accepted to ICCV 2023.
- Feb 2023 One paper accepted to CVPR 2023.
- Jul 2022 One paper accepted to ECCV 2022.
- Jul 2021 One paper accepted to ICCV 2021.
Publications
* denotes equal contribution, ^ denotes corresponding author.
ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos
Peiran Wu*, Yunze Liu*, Miao Liu, Junxiao Shen
WACV, 2026
Investigating how multimodal large language models understand and reason about dynamic 4D environments from ego-centric video inputs.
PointNet4D: A Lightweight 4D Point Cloud Video Backbone for Online and Offline Perception in Robotic Applications
Yunze Liu, Zifan Wang, Peiran Wu, Jiayang Ao
WACV, 2026
Proposes PointNet4D, a lightweight hybrid Mamba-Transformer 4D backbone with 4DMAP pretraining for efficient online and offline point cloud video perception in robotics, achieving strong results across 9 tasks and enabling 4D Diffusion Policy and 4D Imitation Learning on RoboTwin and HandoverSim benchmarks.
CULTURE3D: A Large-Scale and Diverse Dataset of Cultural Landmarks and Terrains for Gaussian-Based Scene Rendering
Xinyi Zheng*, Steve Zhang*, Weizhe Lin, Aaron Zhang, Walterio W. Mayol-Cuevas, Yunze Liu^, Junxiao Shen^
ICCV, 2025
Introduces a 10-billion-point ultra-high-resolution drone dataset of 20 culturally significant landmarks and terrains, enabling large-scale Gaussian-based 3D scene reconstruction and benchmarking.
X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding
Wenqi Zhou*, Kai Cao*, Hao Zheng, Yunze Liu, Xinyi Zheng, Miao Liu, Per Ola Kristensson, Walterio W. Mayol-Cuevas, Fan Zhang, Weizhe Lin, Junxiao Shen
EMNLP (Findings), 2025
Builds a life-logging simulation pipeline and benchmark with 432 ultra-long egocentric video logs (23 minutes–16.4 hours) to stress-test multimodal LLMs on temporal reasoning, memory, and context aggregation.
OnlineHOI: Towards Online Human-Object Interaction Generation and Perception
Yihong Ji, Yunze Liu, Yiyao Zhuo, Weijiang Yu, Fei Ma, Joshua Huang, Fei Yu
ACM Multimedia (ACM MM), 2025
Presents an online framework that jointly predicts future human motions and perceives object states for streaming HOI scenarios.
MutualNeRF: Improve the Performance of NeRF under Limited Samples with Mutual Information Theory
Zifan Wang, Jingwei Li, Yitang Li, Yunze Liu^
Uncertainty in Artificial Intelligence (UAI), 2025
Proposes MutualNeRF, a mutual-information-driven framework that improves NeRF under limited views via MI-guided sparse view sampling and few-shot regularization in both semantic and pixel spaces.
MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining
Yunze Liu, Li Yi
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
Introduces Masked Autoregressive Pretraining that jointly optimizes Mamba and Transformer blocks via local MAE-style reconstruction and global autoregressive generation, yielding state-of-the-art hybrid vision backbones.
MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data
Zifan Wang*, Ziqing Chen*, Junyu Chen*, Jilong Wang, Yuxin Yang, Yunze Liu, Xueyi Liu, He Wang, Li Yi
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
Scales 100K+ synthetic handover scenes and distills them through 4D imitation learning into safe, generalizable mobile robot policies that outperform baselines by at least 15% success in sim and real.
Interactive Humanoid: Online Full-Body Motion Reaction Synthesis with Social Affordance Canonicalization and Forecasting
Yunze Liu, Changxi Chen, Li Yi
International Conference on 3D Vision (3DV), 2025
Proposed a framework for synthesizing physically plausible humanoid reactions using forward dynamics guidance and social affordance forecasting.
PhysReaction: Physically Plausible Real-Time Humanoid Reaction Synthesis via Forward Dynamics Guided 4D Imitation
Yunze Liu, Changxi Chen, Chen Ding, Li Yi
ACM Multimedia (ACM MM), 2024 (Oral Presentation)
Leveraging forward dynamics to guide 4D imitation learning for realistic humanoid reactions.
CrossVideo: Self-supervised Cross-modal Contrastive Learning for Point Cloud Video Understanding
Yunze Liu, Changxi Chen, Zifan Wang, Li Yi
IEEE International Conference on Robotics and Automation (ICRA), 2024
Introduces CrossVideo, a self-supervised cross-modal contrastive learning framework for point cloud video understanding.
Physics-aware Hand-object Interaction Denoising
Haoping Luo, Yunze Liu, Li Yi
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
Proposes a physics-aware hand-object interaction denoising framework that leverages physical constraints to improve the accuracy of hand-object interaction predictions.
LeaF: Learning Frames for 4D Point Cloud Sequence Understanding
Yunze Liu, Junyu Chen, Zekun Zhang, Jingwei Huang, Li Yi
IEEE/CVF International Conference on Computer Vision (ICCV), 2023
Introduces learnable frame representations to effectively capture temporal dynamics in 4D point cloud sequences.
Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning
Zhuoyang Zhang, Yuhao Dong, Yunze Liu, Li Yi
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
Introduces a complete-to-partial 4D distillation framework that leverages self-supervised learning to improve the performance of point cloud sequence representation learning.
Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding
Hao Wen*, Yunze Liu*, Jingwei Huang, Bo Duan, Li Yi (* Equal Contribution)
European Conference on Computer Vision (ECCV), 2022
Introduces a point primitive transformer for long-term 4D point cloud video understanding.
HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction
Yunze Liu, Yun Liu, Che Jiang, Kangbo Lyu, Weikang Wan, Hao Shen, Boqiang Liang, Zhoujie Fu, He Wang, Li Yi
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
Releases HOI4D, a large-scale egocentric 4D HOI dataset with category-level annotations that enables learning robust human-object interaction perception and manipulation policies.
Contrastive Multimodal Fusion with TupleInfoNCE
Yunze Liu, Qingnan Fan, Shanghang Zhang, Hao Dong, Thomas Funkhouser, Li Yi
IEEE/CVF International Conference on Computer Vision (ICCV), 2021
Introduces a contrastive multimodal fusion framework that leverages tuple-based contrastive learning to improve the performance of multimodal fusion.
Academic Services
- Conference Reviewer: CVPR, ICCV, ECCV, ACM MM, ICLR, NeurIPS, ICML, 3DV, WACV, AAAI
- Journal Reviewer: TMM
- Workshop Organizer: CVPR 2023 Workshop on 4D Hand Object Interaction