| CARVIEW |
Hanzi Mao
I'm a research scientist at Google DeepMind Robotics team. Previously, I was a research scientist at Nvidia Deep Imagination Research building world models for Physical AI. Before that, I was a research scientist at Facebook AI Research (FAIR). I am interested in building intelligent machines that can perceive, reason, and act in the real world. I believe foundation models that learn from large-scale, diverse data can accelerate our progress toward this goal.
✉ hannamao15 at gmail dot com
Selected Publications
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control
Nvidia
* Research tech lead
Multicontrol models to generate world states across different environments and lighting using ground-truth and structured inputs.
Cosmos World Foundation Model Platform for Physical AI
Nvidia
* Research tech lead
Best of CES, Best of AI, CNET 2025
paper / project / code / video
A world foundation model platform to advance the development of autonomous systems.
Segment Anything
Alexander Kirillov1,2,4, Eric Mintun2, Nikhila Ravi1,2, Hanzi Mao2, Chloe Rolland3, Laura Gustafson3, Tete Xiao3, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollar4, Ross Girshick4
* Joint first author
ICCV, 2023. Best Paper Honorable Mention
paper / project / dataset / code
A new task, model, and dataset for image segmentation.
Exploring Plain Vision Transformer Backbones for Object Detection
Yanghao Li, Hanzi Mao, Ross Girshick†, Kaiming He†
ECCV, 2022
A plain, non-hierarchical Vision Transformer (ViT) as a backbone network for object detection.
A ConvNet for the 2020s
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie
CVPR, 2022
A pure ConvNet model constructed entirely from standard ConvNet modules. ConvNeXt is accurate, efficient, scalable and very simple in design.
Context-aware Deep Representation Learning for Geo-spatiotemporal Analysis
Hanzi Mao, Xi Liu, Nick Duffield, Hao Yuan, Shuiwang Ji, Binayak Mohanty
ICDM, 2020
A novel semi-supervised attention-based deep representation model that learns context-aware spatiotemporal representations.
Gap Filling of High-Resolution Soil Moisture for SMAP/Sentinel-1: A Two-layer Machine Learning-based Framework
Hanzi Mao, Dhruva Kathuria, Nick Duffield, Binayak Mohanty
Water Resources Research, 2019
A new gap‐filled soil moisture product to address the poor spatial and temporal coverage of the SMAP/Sentinel‐1 product.