| CARVIEW |
Jiayuan Mao
Email: jiayuanm [at] seas.upenn.edu
Jiayuan Mao is a research scientist at Amazon Frontier AI & Robotics, and an incoming assistant professor at the University of Pennsylvania. She finished her PhD at MIT, advised by Josh Tenenbaum and Leslie Kaelbling.
News
- I'm recruiting PhD students this application cycle (Fall 2026). Please apply to the CIS PhD program at Penn by December 15th and list my name on your application. Unfortunately, I’m unable to respond to individual inquiries outside the official application process.
-
To junior PhD/master/undergraduate students: if you would like to chat about life, career plan, or research ideas related to AI/ML, feel free to email me to schedule a meeting.
I will dedicate 30 mins every week for such meetings. I encourage students from underrepresented groups to reach out and will prioritize these meetings.
Underrepresented groups: including but not limited to gender/racial/ethnic minority groups.
- I'm co-organizing the Workshop on Space in Vision, Language, and Embodied AI and the Embodied Agent Interface Challenge at NeurIPS 2025.
- I'm co-organizing the Workshop on Memory and Vision at ICCV 2025.
- I will be giving a talk at the Workshop on Human-Robot-Scene Interaction and Collaboration at ICCV 2025.
- I'm co-organizing the Workshop on Resource-Rational Robot Learning at CoRL 2025.
- I'm co-organizing the Workshop on Programmatic Reinforcement Learning at RLC 2025.
- I'm co-organizing the Workshop on Programmatic Representations for Agent Learning at ICML 2025.
- I'm co-organizing the Workshop on Robot Planning in the Era of Foundation Models at RSS 2025.
- I'm co-organizing the Workshop on Visual Concepts and the Workshop on Foundation Models Meet Embodied Agents at CVPR 2025.
- I'm co-organizing the Workshop on Learning Meets Model-Based Methods for Contact-Rich Manipulation at ICRA 2025.
- I will be giving a talk at the Workshop on Foundation Models and Neuro-Symbolic AI for Robotics at ICRA 2025.
- I'm co-organizing the Workshop on Visual Concepts and the Workshop on Foundation Models for Embodied Agents at CVPR 2025.
- I'm co-organizing the "Learning Language through Grounding" tutorial and the "Foundation Models Meet Embodied Agents" tutorial at NAACL 2025.
- I'm co-organizing the Workshop on Planning in the Era of LLMs and the "Foundation Models Meet Embodied Agents" tutorial at AAAI 2025.
- I was selected as a Rising Star in EECS 2024.
- I was selected as a Rising Star in Generative AI 2024.
- I'm co-organizing the Workshop on Learning Effective Abstractions for Planning at CoRL 2024.
- I'm co-organizing and will be giving a talk at the Workshop on Visual Concepts at ECCV 2024.
- I will be giving a talk at the Bimanual Manipulation: On Kitchen Challenges workshop at ICRA 2024.
- I will be giving a talk at the Brown Robotics Talks at Brown University. (Slides: Compositional Action Representations)
- I will be giving a talk at the NSF Workshop on Hardware-Software Co-design for Neuro-Symbolic Computation.
- I will be giving a talk at the Manipulation Reading Group at the Robotics Institute at Carnegie Mellon University.
- I will be giving a talk at the Coordinated Science Laboratory Student Conference (CSLSC 2024) at the University of Illinois at Urbana-Champaign. (Slides: Integrated Learning and Planning)
- I will be giving a talk at the Robot Representations For Scene Understanding, Reasoning and Planning workshop at RSS 2023. (Slides: Neuro-Symbolic Concepts for Robotic Manipulation / Video)
- I'm co-organizing the Visually Grounded Interaction and Language (VIGIL) workshop at NAACL 2021.
- I'm co-organizing the Neuro-Symbolic Visual Reasoning and Program Synthesis tutorial at CVPR 2020.
Research Highlights
My long-term research goal is to build machines that can continually learn concepts (e.g., properties, relations, skills, rules and algorithms) from their experiences and apply them for reasoning and planning in the physical world. The central theme of my research is to decompose the learning problem into learning a vocabulary of neuro-symbolic concepts. The symbolic part describes their structures and how different concepts can be composed; the neural part handles grounding in perception and physics. I leverage structures to make learning more data-efficient, more compositionally generalizable, and also inference and planning faster.
How should we represent various types of concepts?
How to capture the programmatic structures underlying these concepts (The Theory-Theory of Concepts)?
-
Language Structures
- Phrase Structure from Grounding: [Visually-Grounded Neural Syntax]
- Combinatory Categorial Grammar:
[Grammar-Based Grounded Lexicon] - Logographic Structures:
[Logographic Library Learning]
-
Visual Concepts
- Object-Centric Concepts:
[Neuro-Symbolic Concept Learner], [Unified Visual Semantic Embedding] - + 3D [Neuro-Symbolic 3D]
- + Object Representation Learning: [Language-Mediated ORL]
- Metaconcepts:
[Visual Concept & Metaconcept] - + Continual Learning: [FALCON]
- Physical Events and Causality: [Dynamic Concept Learning], [CLEVRER-Humans]
- Abstract Concepts: [Visual Abstraction]
- Programmatic Image Representations:
[Program-Guided Image Manipulators] - + Perspective: [P3I]
- + Multi-Plane: [Box Program]
- Programmatic Human Motion: [Motion Program], [Motion Concept]
- Object-Centric Concepts:
-
Action Concepts
- Neuro-Symbolic Policies: [NSPort]
- + Part-Centric Skill Modeling:
[Composable Part-Based Policy] - + Part-Centric Keypoints:
[Keypoint Abstraction Imitation] - + Hierarchical Task and Motion:
[Modes from Language] - + Learning from Video:
[Actions from Actionless Videos] - Neuro-Symbolic Planning Representations: [PDSketch]
- + Contact-Centric Skill Modeling: [Manipulation Mechanisms]
- + Planning Domains from LLM:
[Adaptive Domain from Language] - Hybrid Declarative-Imperative Behavior Representations: [CROW]
- + Behavior Rules from LLM: [Compositional Behaviors from Demonstration and Language]
- + Digital Agent Workflow: [AWM]
How can we efficiently learn these concepts from natural supervisions (e.g., language, videos)?
How can we leverage the structures of these concepts to make inference and planning faster?
-
Learning and Decision Theory
- Relational Neural Network Learning: [Neural Logic Machines]
- + Expressiveness and Generalization: [NLM Theory]
- + PAC Learning and Sample Complexity: [Polynomial MHLA]
- Planning Complexity: [Regression Width]
- Hybrid Policy and Planning: [CROW]
-
Differentiable Concept Learning
- Differentiable Reasoning: [Neuro-Symbolic Concept Learner]
- + Joint Syntax Learning: [G2L2]
- + Foundation Model Integration: [Logic-Enhanced FM]
- First-Order Logic Rules and Policies: [Neural Logic Machines]
- + Sparse and Locality: [SpaLoc Networks]
- + Diffusion: [Diffusion-as-Adaptive-Reasoning]
- Temporal Logic Rules and Policies: [TOQ Networks]
- + Rationality: [Rational Subgoals]
-
Planning and Inference Algorithms
- Planning with Hybrid Declarative-Imperative Representations: [CROW]
- Planner with Neuro-Symbolic Domain Models: [PDSketch]
- + Planning with Mechanisms: [Manipulation Mechanisms]
- + Planning with Contact Analogy: [Contact Analogy]
- Neuro-Symbolic Constraint Solver: [Diffusion CCSP]
- + Constraints from Language: [Functional Arrangement]
- Optimization as Inference: [Diffusion-as-Adaptive-Reasoning]
Publications ( show selected / show all by date / show all by topic )
Topics:
Concept Learning and Language Acquisition /
Reasoning and Planning /
Scene and Activity Understanding
Past topics: Object Detection /
Structured NLP
(*/†: indicates equal contribution.)
Learning Linear Attention in Polynomial Time
Morris Yau,
Eykin Akyürek,
Jiayuan Mao,
Joshua B. Tenenbaum,
Stefanie Jegelka,
Jacob Andreas
NeurIPS 2025 (Oral) Paper
Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models
Simeng Han, Stephen Xia, Grant Zhang, Howard Dai, Chen Liu, Lichang Chen, Hoang H Nguyen, Hongyuan Mei, Jiayuan Mao, R. Thomas McCoy
Finding Structure in Logographic Writing with Library Learning II: Grapheme, Sound, and Meaning Systematicity
Guangyuan Jiang, Matthias Hofer, Jiayuan Mao, Lio Wong, Joshua B. Tenenbaum, Roger P. Levy
Agent Workflow Memory
Zora Zhiruo Wang, Jiayuan Mao, Daniel Fried, Graham Neubig
One-Shot Manipulation Strategy Learning by Making Contact Analogies
Yuyao Liu*, Jiayuan Mao*, Joshua Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling
ICRA 2025
Paper /
Project Page /
Code
CoRL Workshop on Learning Effective Abstractions for Planning 2024
Keypoint Abstraction using Large Models for Object-Relative Imitation Learning
Xiaolin Fang*, Bo-Ruei Huang*, Jiayuan Mao*, Jasmine Shone, Joshua B. Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling
ICRA 2025
Paper /
Project Page /
Code
CoRL 2024 Workshop on Language and Robot Learning (Best Paper)
Infer Human's Intentions Before Following Natural Language Instructions
Yanming Wan, Yue Wu, Yiping Wang, Jiayuan Mao*, Natasha Jaques*
What Makes a Maze Look Like a Maze?
Joy Hsu, Jiayuan Mao, Joshua B. Tenenbaum, Noah D. Goodman, Jiajun Wu
ICLR 2025
Paper /
Project Page /
Data
ECCV Human-Inspired Computer Vision Workshop 2024
BLADE: Learning Compositional Behaviors from Demonstration and Language
Weiyu Liu*, Neil Nie*, Ruohan Zhang, Jiayuan Mao†, Jiajun Wu†
CoRL 2024
Paper /
Project Page
CoRL 2024 Workshop on Learning Effective Abstractions for Planning (Oral)
Embodied Agent Interface: A Single Line to Evaluate LLMs for Embodied Decision Making
Manling Li*, Shiyu Zhao*, Qineng Wang*, Kangrui Wang*, Yu Zhou*, Sanjana Srivastava, Cem Gokmen, Tony Lee, Li Erran Li, Ruohan Zhang, Weiyu Liu, Percy Liang, Li Fei-Fei, Jiayuan Mao, Jiajun Wu
NeurIPS 2024 Datasets and Benchmarks Track (Oral)
SoCal NLP 2024 (Best Paper)
Paper /
Project Page /
Code /
Data
Hybrid Declarative-Imperative Representations for Hybrid Discrete-Continuous Decision-Making
Jiayuan Mao, Joshua B. Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling
Finding Structure in Logographic Writing with Library Learning
Guangyuan Jiang, Matthias Hofer, Jiayuan Mao, Lio Wong, Joshua B. Tenenbaum, Roger P. Levy
CogSci 2024 (Best Undergraduate Student Paper) Paper
Learning Iterative Reasoning through Energy Diffusion
Yilun Du*, Jiayuan Mao*, Joshua B. Tenenbaum
"Set It Up!": Functional Object Arrangement with Compositional Generative Models
Yiqing Xu, Jiayuan Mao, Yilun Du, Tomás Lozano-Pérez, Leslie Pack Kaelbling, David Hsu
RSS 2024
Paper /
Project Page
RSS Workshop on Task Specification for General-Purpose Intelligent Robots
Grounding Language Plans in Demonstrations through Counter-factual Perturbations
Yanwei Wang, Tsun-Hsuan Wang, Jiayuan Mao, Michael Hagenow, Julie Shah
ICLR 2024 (Spotlight) Paper / Project Page / Code / MIT News / TechCrunch
Learning Adaptive Planning Representations with Natural Language Guidance
Lio Wong*, Jiayuan Mao*, Pratyusha Sharma*, Zachary S. Siegel, Jiahai Feng, Noa Korneev, Joshua B. Tenenbaum, Jacob Andreas
Learning to Act from Actionless Video through Dense Correspondences
Po-Chen Ko, Jiayuan Mao, Yilun Du, Shao-Hua Sun, Joshua B. Tenenbaum
ICLR 2024 (Spotlight) Paper / Project Page / Code
What Planning Problem Can A Relational Neural Network Solve
Jiayuan Mao, Tomás Lozano-Pérez, Joshua B. Tenenbaum, Leslie Pack Kaelbling
NeurIPS 2023 (Spotlight) Paper / Project Page / Code
What’s Left? Concept Grounding with Logic-Enhanced Foundation Models
Joy Hsu*, Jiayuan Mao*, Joshua B. Tenenbaum, Jiajun Wu
Learning Reusable Manipulation Strategies
Jiayuan Mao, Tomás Lozano-Pérez, Joshua B. Tenenbaum, Leslie Pack Kaelbling
CoRL 2023
Paper /
Project Page
IROS 2023 Workshop on Leveraging Models for Contact-Rich Manipulation (Spotlight)
(Slides /
Video)
Compositional Diffusion-Based Continuous Constraint Solvers
Zhutian Yang, Jiayuan Mao, Yilun Du, Jiajun Wu, Joshua B. Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling
Composable Part-Based Manipulation
Weiyu Liu, Jiayuan Mao, Joy Hsu, Tucker Hermans, Animesh Garg, Jiajun Wu
NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations
Joy Hsu, Jiayuan Mao, Jiajun Wu
CVPR 2023
Paper /
Project Page /
Code
CVPR 2023 Workshop On Compositional 3D Vision (Oral)
Programmatically Grounded, Compositionally Generalizable Robotic Manipulation
Renhao Wang*, Jiayuan Mao*, Joy Hsu, Hang Zhao, Jiajun Wu, Yang Gao
ICLR 2023 (Notable Top 25%) Paper / Project Page
Learning Rational Subgoals from Demonstrations and Instructions
Zhezheng Luo*, Jiayuan Mao*, Jiajun Wu, Tomás Lozano-Pérez, Joshua B. Tenenbaum, Leslie Pack Kaelbling
On the Expressiveness and Generalization of Hypergraph Neural Networks
Zhezheng Luo, Jiayuan Mao, Joshua B. Tenenbaum, Leslie Pack Kaelbling
Sparse and Local Hypergraph Reasoning Networks
Guangxuan Xiao, Leslie Pack Kaelbling, Jiajun Wu, Jiayuan Mao
PDSketch: Integrated Domain Programming, Learning, and Planning
Jiayuan Mao, Tomás Lozano-Pérez, Joshua B. Tenenbaum, Leslie Pack Kaelbling
HandMeThat: Human-Robot Communication in Physical and Social Environments
Yanming Wan*, Jiayuan Mao*, Joshua B. Tenenbaum
NeurIPS 2022 Datasets and Benchmarks Track Paper / Project Page / Code
CLEVRER-Humans: Describing Physical and Causal Events the Human Way
Jiayuan Mao*, Xuelin Yang*, Xikun Zhang, Noah D. Goodman, Jiajun Wu
NeurIPS 2022 Datasets and Benchmarks Track Paper / Project Page / Code
IKEA-Manual: Seeing Shape Assembly Step by Step
Ruocheng Wang, Yunzhi Zhang, Jiayuan Mao, Ran Zhang, Chin-Yi Cheng, Jiajun Wu
NeurIPS 2022 Datasets and Benchmarks Track Paper / Project Page / Code
Translating a Visual LEGO Manual to a Machine-Executable Plan
Ruocheng Wang, Yunzhi Zhang, Jiayuan Mao, Chin-Yi Cheng, Jiajun Wu
Programmatic Concept Learning for Human Motion Description and Synthesis
Sumith Kulal*, Jiayuan Mao*, Alex Aiken†, Jiajun Wu†
FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic descriptions, and Conceptual Relations
Lingjie Mei*, Jiayuan Mao*, Ziqi Wang, Chuang Gan, Joshua B. Tenenbaum
Grammar-Based Grounded Lexicon Learning
Jiayuan Mao, Haoyue Shi, Jiajun Wu, Roger P. Levy, Joshua B. Tenenbaum
Temporal and Object Quantification Networks
Jiayuan Mao*, Zhezheng Luo*, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu, Leslie Pack Kaelbling, Tomer D. Ullman
IJCAI 2021 Paper / Project Page / Code
(First two authors contributed equally; order determined by coin toss.)
Language-Mediated, Object-Centric Representation Learning
Ruocheng Wang*, Jiayuan Mao*, Samuel J. Gershman†, Jiajun Wu†
ACL 2021 (Findings)
Paper /
Talk /
Project Page
SpLU-RoboNLP 2021 (Oral)
Hierarchical Motion Understanding via Motion Programs
Sumith Kulal*, Jiayuan Mao*, Alex Aiken, Jiajun Wu
CVPR 2021 Paper / Talk / Project Page / Code
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
Zhenfang Chen, Jiayuan Mao, Jiajun Wu, Kwan-Yee K. Wong, Joshua B. Tenenbaum, Chuang Gan
Object-Centric Diagnosis of Visual Reasoning
Jianwei Yang, Jiayuan Mao, Jiajun Wu, Devi Parikh, David D. Cox, Joshua B. Tenenbaum, Chuang Gan
ArXiv 2020 Paper
Multi-Plane Program Induction with 3D Box Priors
Yikai Li*, Jiayuan Mao*, Xiuming Zhang, William T. Freeman, Joshua B. Tenenbaum, Noah Snavely, Jiajun Wu
Perspective Plane Program Induction from a Single Image
Yikai Li*, Jiayuan Mao*, Xiuming Zhang, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu
Visual Concept-Metaconcept Learning
Chi Han*, Jiayuan Mao*, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu
Program-Guided Image Manipulators
Jiayuan Mao*, Xiuming Zhang*, Yikai Li, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu
ICCV 2019 Paper / Project Page
(First two authors contributed equally; order determined by coin toss.)
Visually Grounded Neural Syntax Acquisition
Haoyue Shi*, Jiayuan Mao*, Kevin Gimpel, Karen Livescu
ACL 2019 (Best Paper Nomination) Paper / Project Page / Code
Neurally-Guided Structure Inference
Sidi Lu*, Jiayuan Mao*, Joshua B. Tenenbaum, Jiajun Wu
Unified Visual-Semantic Embeddings:
Bridging Vision and Language with Structured Meaning Representations
Hao Wu*, Jiayuan Mao*, Yufeng Zhang, Weiwei Sun, Yuning Jiang, Lei Li, Wei-Ying Ma
The Neuro-Symbolic Concept Learner:
Interpreting Scenes, Words, and Sentences From Natural Supervision
Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B. Tenenbaum, Jiajun Wu
ICLR 2019 (Oral) Paper / Project Page / Code / MIT News / MIT Technology Review
Neural Logic Machines
Honghua Dong*, Jiayuan Mao*, Tian Lin, Chong Wang, Lihong Li, Dengyong Zhou
Neural Phrase-to-Phrase Machine Translation
Jiangtao Feng, Lingpeng Kong, Po-Sen Huang, Chong Wang, Da Huang, Jiayuan Mao, Kan Qiao, Dengyong Zhou
ArXiv Preprint Paper
Acquisition of Localization Confidence for Accurate Object Detection
Borui Jiang*, Ruixuan Luo*, Jiayuan Mao*, Tete Xiao, Yuning Jiang
Learning Visually-Grounded Sementics from Contrastive Adversarial Samples
Haoyue Shi*, Jiayuan Mao*, Tete Xiao*, Yuning Jiang, Jian Sun
Universal Agent for Disentangling Environments and Tasks
Jiayuan Mao, Honghua Dong, Joseph J. Lim
What Can Help Pedestrian Detection?
Jiayuan Mao*, Tete Xiao*, Yuning Jiang, Zhimin Cao
Dedicated to my best friend Zhaoyi