Carview!

CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Thu, 04 Dec 2025 07:48:13 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"69313cbd-7dd8" expires: Sun, 28 Dec 2025 21:24:06 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: FCAF:1387E:7F4FE4:8EF550:69519D9C accept-ranges: bytes age: 0 date: Sun, 28 Dec 2025 21:14:06 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210076-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766956446.187941,VS0,VE224 vary: Accept-Encoding x-fastly-request-id: 382c5f3d684322273e2396fb91e90bf6e623effb content-length: 6965 Yunze Liu - Tsinghua University

About Me

I am a fourth-year Ph.D. student at the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University, advised by Prof. Li Yi (弋力).

I also work as an AI Researcher at Memories.ai, working with Prof. Shawn Shen (University of Bristol).

My research interests lie in 4D Human-Object Interaction (HOI), Embodied AI, and Multimodal Large Language Models (MLLMs). I am passionate about equipping robots with the ability to understand and interact with the dynamic 4D world.

News

Nov 2025 Two paper accepted to WACV 2026.
Nov 2025 My total citations surpassed 600+.
Jun 2025 HOI4D dataset surpassed 200+ citations.
Aug 2025 One paper accepted to ACM MM 2025.
Jul 2025 One paper accepted to ICCV 2025.
Jul 2025 One paper accepted to EMNLP 2025.
May 2025 One paper accepted to UAI 2025.
Feb 2025 Two paper accepted to CVPR 2025.
Nov 2024 One paper accepted to 3DV 2025.
Jul 2024 One paper accepted to ACM MM 2024 (Oral).
Feb 2024 One paper accepted to CVPR 2024.
Jan 2024 One paper accepted to ICRA 2024.
Jul 2023 One paper accepted to ICCV 2023.
Feb 2023 One paper accepted to CVPR 2023.
Jul 2022 One paper accepted to ECCV 2022.
Jul 2021 One paper accepted to ICCV 2021.

Publications

* denotes equal contribution, ^ denotes corresponding author.

ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos

Peiran Wu*, Yunze Liu*, Miao Liu, Junxiao Shen

WACV, 2026

Investigating how multimodal large language models understand and reason about dynamic 4D environments from ego-centric video inputs.

[Paper] [Code] [Huggingface]

PointNet4D: A Lightweight 4D Point Cloud Video Backbone for Online and Offline Perception in Robotic Applications

Yunze Liu, Zifan Wang, Peiran Wu, Jiayang Ao

WACV, 2026

Proposes PointNet4D, a lightweight hybrid Mamba-Transformer 4D backbone with 4DMAP pretraining for efficient online and offline point cloud video perception in robotics, achieving strong results across 9 tasks and enabling 4D Diffusion Policy and 4D Imitation Learning on RoboTwin and HandoverSim benchmarks.

[Paper] [Code]

CULTURE3D: A Large-Scale and Diverse Dataset of Cultural Landmarks and Terrains for Gaussian-Based Scene Rendering

Xinyi Zheng*, Steve Zhang*, Weizhe Lin, Aaron Zhang, Walterio W. Mayol-Cuevas, Yunze Liu^, Junxiao Shen^

ICCV, 2025

Introduces a 10-billion-point ultra-high-resolution drone dataset of 20 culturally significant landmarks and terrains, enabling large-scale Gaussian-based 3D scene reconstruction and benchmarking.

[Paper] [Code]

X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding

Wenqi Zhou*, Kai Cao*, Hao Zheng, Yunze Liu, Xinyi Zheng, Miao Liu, Per Ola Kristensson, Walterio W. Mayol-Cuevas, Fan Zhang, Weizhe Lin, Junxiao Shen

EMNLP (Findings), 2025

Builds a life-logging simulation pipeline and benchmark with 432 ultra-long egocentric video logs (23 minutes–16.4 hours) to stress-test multimodal LLMs on temporal reasoning, memory, and context aggregation.

[Paper] [Code]

OnlineHOI: Towards Online Human-Object Interaction Generation and Perception

Yihong Ji, Yunze Liu, Yiyao Zhuo, Weijiang Yu, Fei Ma, Joshua Huang, Fei Yu

ACM Multimedia (ACM MM), 2025

Presents an online framework that jointly predicts future human motions and perceives object states for streaming HOI scenarios.

[Paper]

MutualNeRF: Improve the Performance of NeRF under Limited Samples with Mutual Information Theory

Zifan Wang, Jingwei Li, Yitang Li, Yunze Liu^

Uncertainty in Artificial Intelligence (UAI), 2025

Proposes MutualNeRF, a mutual-information-driven framework that improves NeRF under limited views via MI-guided sparse view sampling and few-shot regularization in both semantic and pixel spaces.

[Paper]

MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining

Yunze Liu, Li Yi

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

Introduces Masked Autoregressive Pretraining that jointly optimizes Mamba and Transformer blocks via local MAE-style reconstruction and global autoregressive generation, yielding state-of-the-art hybrid vision backbones.

[Paper] [Code]

MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data

Zifan Wang*, Ziqing Chen*, Junyu Chen*, Jilong Wang, Yuxin Yang, Yunze Liu, Xueyi Liu, He Wang, Li Yi

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

Scales 100K+ synthetic handover scenes and distills them through 4D imitation learning into safe, generalizable mobile robot policies that outperform baselines by at least 15% success in sim and real.

[Paper] [Project]

Interactive Humanoid: Online Full-Body Motion Reaction Synthesis with Social Affordance Canonicalization and Forecasting

Yunze Liu, Changxi Chen, Li Yi

International Conference on 3D Vision (3DV), 2025

Proposed a framework for synthesizing physically plausible humanoid reactions using forward dynamics guidance and social affordance forecasting.

[Paper] [Code]

PhysReaction: Physically Plausible Real-Time Humanoid Reaction Synthesis via Forward Dynamics Guided 4D Imitation

Yunze Liu, Changxi Chen, Chen Ding, Li Yi

ACM Multimedia (ACM MM), 2024 (Oral Presentation)

Leveraging forward dynamics to guide 4D imitation learning for realistic humanoid reactions.

[Paper] [Project Page]

CrossVideo: Self-supervised Cross-modal Contrastive Learning for Point Cloud Video Understanding

Yunze Liu, Changxi Chen, Zifan Wang, Li Yi

IEEE International Conference on Robotics and Automation (ICRA), 2024

Introduces CrossVideo, a self-supervised cross-modal contrastive learning framework for point cloud video understanding.

[Paper]

Physics-aware Hand-object Interaction Denoising

Haoping Luo, Yunze Liu, Li Yi

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Proposes a physics-aware hand-object interaction denoising framework that leverages physical constraints to improve the accuracy of hand-object interaction predictions.

[Paper]

LeaF: Learning Frames for 4D Point Cloud Sequence Understanding

Yunze Liu, Junyu Chen, Zekun Zhang, Jingwei Huang, Li Yi

IEEE/CVF International Conference on Computer Vision (ICCV), 2023

Introduces learnable frame representations to effectively capture temporal dynamics in 4D point cloud sequences.

[Paper]

Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning

Zhuoyang Zhang, Yuhao Dong, Yunze Liu, Li Yi

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

Introduces a complete-to-partial 4D distillation framework that leverages self-supervised learning to improve the performance of point cloud sequence representation learning.

[Paper] [Code]

Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding

Hao Wen*, Yunze Liu*, Jingwei Huang, Bo Duan, Li Yi (* Equal Contribution)

European Conference on Computer Vision (ECCV), 2022

Introduces a point primitive transformer for long-term 4D point cloud video understanding.

[Paper] [Code]

HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction

Yunze Liu, Yun Liu, Che Jiang, Kangbo Lyu, Weikang Wan, Hao Shen, Boqiang Liang, Zhoujie Fu, He Wang, Li Yi

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

Releases HOI4D, a large-scale egocentric 4D HOI dataset with category-level annotations that enables learning robust human-object interaction perception and manipulation policies.

[Paper] [Project]

Contrastive Multimodal Fusion with TupleInfoNCE

Yunze Liu, Qingnan Fan, Shanghang Zhang, Hao Dong, Thomas Funkhouser, Li Yi

IEEE/CVF International Conference on Computer Vision (ICCV), 2021

Introduces a contrastive multimodal fusion framework that leverages tuple-based contrastive learning to improve the performance of multimodal fusion.

[Paper] [Code]

Academic Services

Conference Reviewer: CVPR, ICCV, ECCV, ACM MM, ICLR, NeurIPS, ICML, 3DV, WACV, AAAI
Journal Reviewer: TMM
Workshop Organizer: CVPR 2023 Workshop on 4D Hand Object Interaction

Original Source | Taken Source