CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Fri, 05 Sep 2025 21:10:20 GMT access-control-allow-origin: * etag: W/"68bb51bc-810f" expires: Sun, 28 Dec 2025 02:24:47 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 926E:2F7ECD:72288F:7FCA35:69509296 accept-ranges: bytes age: 0 date: Sun, 28 Dec 2025 02:14:47 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210032-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766888087.476484,VS0,VE201 vary: Accept-Encoding x-fastly-request-id: 8c111ab6fc79a24fbdb4e2b82cb7da70ba413f4b content-length: 6908 Jing Yu Koh

Jing Yu Koh

jingyuk@cs.cmu.edu

I am a research scientist at TBD Lab of Meta Superintelligence Labs. I lead the computer use agents team to develop general purpose agents that can automate computer work.

I'm currently on a leave of absence from a PhD in machine learning at Carnegie Mellon University where I'm advised by Daniel Fried and Ruslan Salakhutdinov. Prior to starting my PhD, I worked at Google Research in Jason Baldridge's team from 2019-2022, where I researched vision-and-language problems and generative models. Before that, I completed my undergraduate studies at the Singapore University of Technology and Design summa cum laude (highest honors) in 2019.

My first name is "Jing Yu" and informally I go by the nickname "JY". 许靖宇 is my name in Chinese. I'm from Singapore.

News

(Jan 2025) Gave an invited talk about Multimodal Computer Agents at Google DeepMind.
(Aug 2024) Gave invited talks about Tree Search for LM Agents at UW, CMU, and Meta.
(Jun 2024) Personal update: I've joined the Llama team at Meta to build multimodal agents!
(Jun 2024) Excited to share a new preprint: Tree Search for Language Model Agents.
(Spring 2024) Gave invited talks about VisualWebArena at NUS, Jane Street, and Cohere for AI (recording).
(Feb 2024) Awarded the Jane Street Graduate Research Fellowship. Thank you Jane Street!
(Jan 2024) Excited to share a new preprint: VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks.
(Sep 2023) 1 paper accepted to NeurIPS 2023!
(Summer 2023) Gave an invited talk about GILL at DLCT and Cohere For AI (slides, recording).
(Apr 2023) 1 paper accepted to ICML 2023!
(Spring 2023) Gave invited talks at Microsoft Research, Apple AI/ML, Georgia Tech, and the London ML Meetup (recording, slides).

Older news

(Dec 2022) I made a bet on LLM capabilities with my office mate Ben Chugg. Bubble tea is on the line.

(Nov 2022) Parti was accepted to TMLR with a Featured Certification!

(Oct 2022) In the spirit of paying it forward, I'm sharing my Statement of Purpose publicly. Hope it helps future applicants!

(Jul 2022) After 2.73 wonderful years at Google, I've left to pursue my PhD at Carnegie Mellon University!

(January 2022) 1 paper accepted to ICLR 2022!

(December 2021) Serving as a reviewer for CVPR 2022.

(July 2021) 1 paper accepted to ICCV 2021!

(July 2021) Presenting an invited talk at Microsoft Research.

(July 2021) Serving as a reviewer for NeurIPS 2021.

(March 2021) 1 paper accepted to CVPR 2021!

(January 2021) 1 paper accepted to ICLR 2021!

(October 2020) 1 paper accepted to WACV 2021!

(July 2020) 1 paper accepted to ECCV 2020!

(October 2019) Officially joined Google as an AI Resident in Mountain View, California.

Selected Publications [Google Scholar]

2025

Tree Search for Language Model Agents

Jing Yu Koh, Stephen McAleer, Daniel Fried, Ruslan Salakhutdinov

TMLR, 2025.

Project Page PDF Code & Data

2024

VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks

Jing Yu Koh, Robert Lo*, Lawrence Jang*, Vikram Duvvur*, Ming Chong Lim*, Po-Yu Huang*, Graham Neubig, Shuyan Zhou, Ruslan Salakhutdinov, Daniel Fried

ACL, 2024.

As seen on: Wired.

Project Page PDF Code & Data Talk Wired Article

2023

Generating Images with Multimodal Language Models

Jing Yu Koh, Daniel Fried, Ruslan Salakhutdinov

NeurIPS, 2023.

Project Page PDF Code Slides Talk

Grounding Language Models to Images for Multimodal Inputs and Outputs

Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried

ICML, 2023.

Project Page PDF Code Slides Talk

VQ3D: Learning a 3D-Aware Generative Model on ImageNet

Kyle Sargent, Jing Yu Koh, Han Zhang, Huiwen Chang, Charles Herrmann, Pratul Srinivasan, Jiajun Wu, Deqing Sun

ICCV (oral, best paper finalist), 2023.

Project Page PDF

Simple and Effective Synthesis of Indoor 3D Scenes

Jing Yu Koh*, Harsh Agrawal*, Dhruv Batra, Richard Tucker, Austin Waters, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson (* denotes equal contribution)

AAAI, 2023.

PDF Code Video

2022

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, Yonghui Wu

TMLR, 2022.

Website PDF GitHub