Hanzi Mao

Hanzi Mao

I'm a research scientist at Google DeepMind Robotics team. Previously, I was a research scientist at Nvidia Deep Imagination Research building world models for Physical AI. Before that, I was a research scientist at Facebook AI Research (FAIR). I am interested in building intelligent machines that can perceive, reason, and act in the real world. I believe foundation models that learn from large-scale, diverse data can accelerate our progress toward this goal.

Selected Publications


Cosmos-Predict2

Cosmos-Predict2

Nvidia

* Research tech lead

A world foundation model for Physical AI builders — fully open and adaptable.

Cosmos-Transfer1

Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control

Nvidia

* Research tech lead

Multicontrol models to generate world states across different environments and lighting using ground-truth and structured inputs.

Cosmos Platform

Cosmos World Foundation Model Platform for Physical AI

Nvidia

* Research tech lead

Best of CES, Best of AI, CNET 2025

A world foundation model platform to advance the development of autonomous systems.

Segment Anything

Segment Anything

Alexander Kirillov1,2,4, Eric Mintun2, Nikhila Ravi1,2, Hanzi Mao2, Chloe Rolland3, Laura Gustafson3, Tete Xiao3, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollar4, Ross Girshick4

* Joint first author

ICCV, 2023. Best Paper Honorable Mention

A new task, model, and dataset for image segmentation.

ViTDet

Exploring Plain Vision Transformer Backbones for Object Detection

Yanghao Li, Hanzi Mao, Ross Girshick†, Kaiming He†

ECCV, 2022

A plain, non-hierarchical Vision Transformer (ViT) as a backbone network for object detection.

ConvNeXt

A ConvNet for the 2020s

Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie

CVPR, 2022

A pure ConvNet model constructed entirely from standard ConvNet modules. ConvNeXt is accurate, efficient, scalable and very simple in design.

Context-aware Deep Representation

Context-aware Deep Representation Learning for Geo-spatiotemporal Analysis

Hanzi Mao, Xi Liu, Nick Duffield, Hao Yuan, Shuiwang Ji, Binayak Mohanty

ICDM, 2020

A novel semi-supervised attention-based deep representation model that learns context-aware spatiotemporal representations.

Gap Filling

Gap Filling of High-Resolution Soil Moisture for SMAP/Sentinel-1: A Two-layer Machine Learning-based Framework

Hanzi Mao, Dhruva Kathuria, Nick Duffield, Binayak Mohanty

Water Resources Research, 2019

A new gap‐filled soil moisture product to address the poor spatial and temporal coverage of the SMAP/Sentinel‐1 product.

Misc


When I'm not doing research, I find balance in exercising (, , , ), growing plants, traveling to experience different cultures, and learning more about the Earth's natural history. I hope for a world of peace and love.