CARVIEW |
Select Language
HTTP/2 301
location: https://research.google/people/papers
content-type: text/html; charset=UTF-8
x-content-type-options: nosniff
date: Wed, 08 Oct 2025 23:20:00 GMT
expires: Wed, 08 Oct 2025 23:50:00 GMT
cache-control: public, max-age=1800
server: sffe
content-length: 234
x-xss-protection: 0
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
HTTP/2 301
content-type: text/html; charset=utf-8
location: https://research.google/pubs/
x-frame-options: DENY
vary: Cookie
strict-transport-security: max-age=31536000; includeSubDomains; preload
x-content-type-options: nosniff
referrer-policy: same-origin
cross-origin-opener-policy: same-origin
cache-control: no-cache
x-wagtail-cache: skip
x-cloud-trace-context: b0e7e3273aacbd013a13d6be8a0b26e3
date: Wed, 08 Oct 2025 23:20:00 GMT
server: Google Frontend
content-length: 0
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
HTTP/2 200
content-type: text/html; charset=utf-8
vary: Accept-Encoding
content-security-policy: script-src 'self' 'unsafe-inline' 'strict-dynamic' http: https: 'sha256-zBmfIicekWsk+Q02/57n6lzm2HIgbBeWN/st19KJYBM=' 'sha256-nKvv2YwBUD93NJaZ6VA5aP7XwmGV/S3G2FkCSI49/gE=' 'sha256-8Tmnm4NhLMrRqh1ZhctvStRyWVVRfk4CHaicfEzZUuI=' 'sha256-Nj7VfcL03AiQQy3lfhSluB1hFwylXDUm+VI2NCh34/w=' 'sha256-HbfYgUUu54uUYLd8WNbMYbcHGHThlfdYPhZmxdlxx3k=' 'sha256-h+sPBVMkWSsyFrQfEmLAhGUET0J7IU8+e68UpCsNdWE=' 'sha256-xdXe7bsAE8jwMFwvzClLp6sF7kElTj3p6FLnfy5neGc=' 'sha256-F+KNqDpRAu0lnbnkzC0Nkgg/m4aDWLk0PCZJY+T4oiM=' 'sha256-x2q8GGYj0PIvCV8AfX2Lv4CKDmK6d3w8YhMV8BwCGqg=' 'sha256-HOMlxQ7t6Wh2T6NDsmOtVTa44+aepnSs1J9eYen32Xk=' 'sha256-KwxvtB46oTihNSE+ggiI4oyvgiSHdj9E5+wG+P6DTD0=' 'sha256-KO07c+2Siu0kHdu/DmM+rvrdVUgTcNPjkSbmTAO8QrE='; media-src 'self' https://*.googleusercontent.com/ https://storage.googleapis.com/gweb-research2023-stg-media-mvp/ https://storage.googleapis.com/gweb-research2023-stg-media/ https://storage.googleapis.com/gweb-research2023-media/ https://gstatic.com/ https://storage.googleapis.com/bioacoustics-www1/ https://storage.googleapis.com/chirp-public-bucket/ https://storage.googleapis.com/h01-release/ https://storage.googleapis.com/brain-genomics-public/ https://github.com/ https://implicitbc.github.io/ https://google.github.io/ https://dynibar.github.io/ https://google-research.github.io/ https://innermonologue.github.io/ https://iterative-refinement.github.io/ https://infinite-nature-zero.github.io/ https://google-research-datasets.github.io/ https://language-to-reward.github.io/ https://*.gstatic.com/ https://raw.githubusercontent.com/ https://karolhausman.github.io/mt-opt/img/mt-opt-grid.mp4 https://palm-e.github.io/videos/palm-e-teaser.mp4 https://research-il.github.io/ https://transporternets.github.io/ https://code-as-policies.github.io https://robotics-transformer.github.io/ https://michelleramanovich.github.io/ https://interactive-language.github.io/video/realtime_30.mp4 https://services.google.com/fh/files/blogs/aiblog_cinematicphotos.mp4 https://vlmaps.github.io/static/images/vlmaps_blog_post.mp4; connect-src 'self' *.google-analytics.com *.googletagmanager.com *.analytics.google.com *.gstatic.com *.google.com; frame-src 'self' *.google.com *.withgoogle.com www.youtube.com https://google.earthengine.app/view/ocean https://mmeka-ee.projects.earthengine.app/view/temporal-demo https://storage.googleapis.com; img-src 'self' data: https://storage.cloud.google.com/gweb-research2023-stg-media-mvp/ https://*.googleusercontent.com/ https://storage.googleapis.com/gweb-research2023-stg-media-mvp/ https://storage.googleapis.com/gweb-research2023-stg-media/ https://storage.googleapis.com/gweb-research2023-media/ https://research.google *.googletagmanager.com *.google-analytics.com https://*.googleusercontent.com/ https://blogger.googleusercontent.com *.ytimg.com *.bp.blogspot.com https://docs.google.com/a/google.com/ https://i.imgur.com/WZocAi7.png https://i.imgur.com/oPCeEcZ.png https://i.imgur.com/eVbbGwD.png https://upload.wikimedia.org/wikipedia/commons/e/ed/Becky_Hammon.jpg https://ngrams.googlelabs.com/ https://research.googleblog.com/uploaded_images/first06-777007.jpg https://googleresearch.blogspot.com/uploaded_images/first06-777007.jpg https://blog.research.google/uploaded_images/first06-777007.jpg https://work.fife.usercontent.google.com/fife/ https://www.gstatic.com/images/branding/googleg_gradient/; style-src 'self' 'unsafe-inline' *.google.com *.gstatic.com fonts.googleapis.com; base-uri 'none'; default-src 'self' *.gstatic.com https://www.youtube.com/embed/kTvHIDKLFqc https://www.youtube.com/embed/Qh-4qF07V1s https://www.youtube.com/embed/gBfynvifkOY https://www.youtube.com/embed/ZMZr83rwdNI https://www.youtube.com/embed/LVFe6P-C7iY https://www.youtube.com/embed/OY2vWMtSsIM https://www.youtube.com/embed/wRCPCNtViGA https://www.youtube.com/embed/iGTM6xs2sck
x-frame-options: DENY
strict-transport-security: max-age=31536000; includeSubDomains; preload
x-content-type-options: nosniff
referrer-policy: same-origin
cross-origin-opener-policy: same-origin
expires: Wed, 08 Oct 2025 23:25:44 GMT
cache-control: max-age=1800
x-wagtail-cache: hit
content-encoding: gzip
x-cloud-trace-context: 24ef4873ff5d8ad13a13d6be8a0b2636
date: Wed, 08 Oct 2025 23:20:01 GMT
server: Google Frontend
content-length: 34563
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
Publications – Google Research
Jump to Content
Search on Google Scholar
Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Filter by:
Year
Team
Research Area
Sort By
- Title
- Title, descending
- Year
- Year, descending
1 - 15 of 10501 publications
chip template
DORA 2025 State of AI-assisted Software Development Report
Derek DeBellis
Matt Beane
Edward Fraser
Ben Good
Eirini Kalliamvakou
Gene Kim
Daniella Villalba
DORA, Google (2025)
Preview abstract
In 2025, the central question for technology leaders is no longer if they should adopt AI, but how to realize its value. DORA’s research includes more than 100 hours of qualitative data and survey responses from nearly 5,000 technology professionals from around the world. The research reveals a critical truth: AI’s primary role in software development is that of an amplifier. It magnifies the strengths of high-performing organizations and the dysfunctions of struggling ones.
View details
Mind the GAP: Geometry Aware Passthrough Mitigates Cybersickness
Trishia Chemaly
Mohit Goyal
Sakar Khattar
Bjorn Vlaskamp
Aveek Purohit
Konstantine Tsotsos
2025
Preview abstract
Virtual Reality headsets isolate users from the real-world by restricting their perception to the virtual-world. Video See-Through (VST) headsets address this by utilizing world-facing cameras to create Augmented Reality experiences. However, directly displaying camera feeds can cause visual discomfort and cybersickness due to the inaccurate perception of scale and exaggerated motion parallax. This paper presents initial findings on the potential of geometry aware passthrough systems to mitigate cybersickness through enhanced depth perception. We introduce a promising protocol for quantitatively measuring cybersickness experienced by users in VST headsets. Using this protocol, we conduct a user study to compare direct passthrough and geometry aware passthrough systems. To the best of our knowledge, our study is the first one to reveal reduced nausea, disorientation, and total scores of cybersickness with geometry aware passthrough. It also uncovers several potential avenues to further mitigate visually-induced discomfort.
View details
Binamix -- A Python Library for Generating Binaural Audio Datasets
Dan Barry
Davoud Shariat Panah
Alessandro Ragano
Andrew Hines
AES 158th Audio Engineering Society Convention (2025)
Preview abstract
The increasing demand for spatial audio in applications such as virtual reality, immersive media, and spatial audio research necessitates robust solutions to generate binaural audio data sets for use in testing and validation. Binamix is an open-source Python library designed to facilitate programmatic binaural mixing using the extensive SADIE II Database, which provides Head Related Impulse Response (HRIR) and Binaural Room Impulse Response (BRIR) data for 20 subjects. The Binamix library provides a flexible and repeatable framework for creating large-scale spatial audio datasets, making it an invaluable resource for codec evaluation, audio quality metric development, and machine learning model training. A range of pre-built example scripts, utility functions, and visualization plots further streamline the process of custom pipeline creation. This paper presents an overview of the library’s capabilities, including binaural rendering, impulse response interpolation, and multi-track mixing for various speaker layouts. The tools utilize a modified Delaunay triangulation technique to achieve accurate HRIR/BRIR interpolation where desired angles are not present in the data. By supporting a wide range of parameters such as azimuth, elevation, subject Impulse Responses (IRs), speaker layouts, mixing controls, and more, the library enables researchers to create large binaural datasets for any downstream purpose. Binamix empowers researchers and developers to advance spatial audio applications with reproducible methodologies by offering an open-source solution for
binaural rendering and dataset generation. We release the library under the Apache 2.0 License at https://github.com/QxLabIreland/Binamix/
View details
A Recipe for Improving Remote Sensing Zero Shot Generalization
Aviad Barzilai
Yotam Gigi
Vered Silverman
Yehonathan Refael
Bolous Jaber
Amr Helmy
3rd ML4RS Workshop at ICLR 2025
Preview abstract
Foundation models have had a significant impact across various AI applications, enabling applications for use cases that were previously impossible. Visual language models (VLMs), in particular, have outperformed other techniques in many tasks. In remote sensing (RS), foundation models have shown improvements across various applications. However, unlike other fields, the use of VLMs with large-scale remote sensing image-text datasets remains limited.
In this work, we first introduce two novel image-caption datasets for training of remote sensing foundation models. The first dataset pairs aerial and satellite imagery, aligned with Google-Maps data, with high-quality captions generated using Gemini. The second utilizes public web images and their corresponding alt-text, filtered for only remote sensing domain, resulting in a highly diverse dataset.
We show that using these datasets to pre-train the Mammut [], a VLM architecture, results in state-of-the-art generalization performance in a zero-shot classification and cross-modal retrieval on well-known public benchmarks. Secondly, we leverage this newly pre-trained VLM to generate inference attention maps for a novel class query (i.e., a class unseen during training). We subsequently propose an iterative self-supervised fine-tuning approach where samples aligned with these attention maps are iteratively pseudo-labeled and utilized for model training.
View details
An Empirical Study of Time of Day Breakpoints in Traffic Light Plans
Eliav Buchnik
Tom Kalvari
Jack Haddad
Dan Karliner
Danny Veikherman
Shai Ferster
Ori Rottenstreich
2025
Preview abstract
Fixed time strategy is a common approach in signal traffic control in which signal plans are simple and periodic, enjoying easy implementation without detection mechanisms. A traffic light is associated with several daily plans, each applied to several consecutive hours. Time-of-day breakpoints (TODs) refer to the times over the day in which the plan is changed. TODs are often selected based on traffic, aiming to divide the day into groups of consecutive hours with similar traffic characteristics within each group of hours. We present a methodology to study time-of-day breakpoints in practice. We use this methodology to estimate and analyze time-of-day breakpoints in the city of Rio de Janeiro, Brazil based on traffic properties derived from traffic trajectories. Our study examines over 900 of the city intersections. We refer to properties such as the number of daily plans and the times by which plans start. We also provide traffic-aware insights on the potential improvement in the selection of TODs and identify key intersections where adjusting TODs could reduce average delay times. We identify potential improvements in over 8% of the examined intersections. These findings provide valuable insights for traffic engineers seeking to optimize signal timing.
View details
SMaCk: Efficient Instruction Cache Attacks via Self-Modifying Code Conflicts
Seonghun Son
Berk Gulmezoglu
ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2025)
Preview abstract
Self-modifying code (SMC) allows programs to alter their own instructions, optimizing performance and functionality on x86 processors. Despite its benefits, SMC introduces unique microarchitectural behaviors that can be exploited for malicious purposes. In this paper, we explore the security implications of SMC by examining how specific x86 instructions affecting instruction cache lines lead to measurable timing discrepancies between cache hits and misses. These discrepancies facilitate refined cache attacks, making them less noisy and more effective. We introduce novel attack techniques that leverage these timing variations to enhance existing methods such as Prime+Probe and Flush+Reload. Our advanced techniques allow adversaries to more precisely attack cryptographic keys and create covert channels akin
to Spectre across various x86 platforms. Finally, we propose a dynamic detection methodology utilizing hardware performance counters to mitigate these enhanced threats.
View details
"It is important to consult" a linguist: Verb-Argument Constructions in ChatGPT and human experts' medical and financial advice
Chris Stewart
Alistair Windsor
J. Elliott Casal
PLOS One (2025)
Preview abstract
This paper adopts a Usage-Based Construction Grammar perspective to compare human- and AI-generated language, focusing on Verb-Argument Constructions (VACs) as a lens for analysis. Specifically, we examine solicited advice texts in two domains—Finance and Medicine—produced by humans and ChatGPT across different GPT models (3.5, 4, and 4o) and interfaces (3.5 Web vs. 3.5 API). Our findings reveal broad consistency in the frequency and distribution of the most common VACs across human- and AI-generated texts, though ChatGPT exhibits a slightly higher reliance on the most frequent constructions. A closer examination of the verbs occupying these constructions uncovers significant differences in the meanings conveyed, with a notable growth away from human-like language production in macro level perspectives (e.g., length) and towards humanlike verb-VAC patterns with newer models. These results underscore the potential of VACs as a powerful tool for analyzing AI-generated language and tracking its evolution over time.
View details
YETI (YET to Intervene) Proactive Interventions by Multimodal AI Agents in Augmented Reality Tasks
Saptarashmi Bandyopadhyay
Vikas Bahirwani
Lavisha Aggarwal
Bhanu Guda
Lin Li
Andrea Colaco
2025
Preview abstract
Multimodal AI Agents are AI models that have the capability of interactively and cooperatively assisting human users to solve day-to-day tasks. Augmented Reality (AR) head worn devices can uniquely improve the user experience of solving procedural day-to-day tasks by providing egocentric multimodal (audio and video) observational capabilities to AI Agents. Such AR capabilities can help the AI Agents see and listen to actions that users take which can relate to multimodal capabilities of human users. Existing AI Agents, either Large Language Models (LLMs) or Multimodal Vision-Language Models (VLMs) are reactive in nature, which means that models cannot take an action without reading or listening to the human user's prompts. Proactivity of AI Agents, on the other hand, can help the human user detect and correct any mistakes in agent observed tasks, encourage users when they do tasks correctly, or simply engage in conversation with the user - akin to a human teaching or assisting a user. Our proposed YET to Intervene (YETI) multimodal Agent focuses on the research question of identifying circumstances that may require the Agent to intervene proactively. This allows the Agent to understand when it can intervene in a conversation with human users that can help the user correct mistakes on tasks, like cooking, using Augmented Reality. Our YETI Agent learns scene understanding signals based on interpretable notions of Structural Similarity (SSIM) on consecutive video frames. We also define the alignment signal which the AI Agent can learn to identify if the video frames corresponding to the user's actions on the task are consistent with expected actions. These signals are used by our AI Agent to determine when it should proactively intervene. We compare our results on the instances of proactive intervention in the HoloAssist multimodal benchmark for an expert agent guiding an user agent to complete procedural tasks.
View details
Probing non-equilibrium topological order on a quantum processor
Melissa Will
Tyler Cochran
Bernhard Jobst
Norhan Eassa
Michael Knap
Adam Gammon-Smith
Frank Pollmann
Nature, 645 (2025), 348–353
Preview abstract
Out-of-equilibrium phases in many-body systems constitute a new paradigm in quantum matter—they exhibit dynamical properties that may otherwise be forbidden by equilibrium thermodynamics. Among these non-equilibrium phases are periodically driven (Floquet) systems, which are generically difficult to simulate classically because of their high entanglement. Here we realize a Floquet topologically ordered state on an array of superconducting qubits. We image the characteristic dynamics of its chiral edge modes and characterize its emergent anyonic excitations. Devising an interferometric algorithm allows us to introduce and measure a bulk topological invariant to probe the dynamical transmutation of anyons for system sizes up to 58 qubits. Our work demonstrates that quantum processors can provide key insights into the thus-far largely unexplored landscape of highly entangled non-equilibrium phases of matter.
View details
Calibration Properties of Time-Series Foundation Models: An Empirical Analysis
Coen Adler
Samar Abdi
Yuxin Chang
Padhraic Smyth
2025
Preview abstract
Recent development of foundation models for time series data has generated considerable interest in using such models across a variety of applications. Although they achieve state-of-the-art predictive performance, the ability to produce well-calibrated probabilistic distributions is critical for practical applications and is relatively underexplored. In this paper, we investigate the calibration-related properties of five recent time series foundation models and two competitive baselines. We perform systematic evaluations and identify significant variation in calibration performances across models.
View details
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Zilong Wang
Steven Zheng
Swaroop Mishra
Yuwei Zhang
Anush Mattapalli
Ankur Taly
Jingbo Shang
ICLR 2025
Preview abstract
Retrieval augmented generation (RAG) has attracted a lot of attention across both academia and industry due to its capability in inserting timely and accurate evidence to the generation by large language models. However, the introduction of retrieved evidence largely makes the input prompt longer, which would harm the understanding quality of large language models and make it slower in actual usage scenarios. To solve these issues, we propose SpeculativeRAG, which leverages a smaller LLM to conduct the retrieval augmented generation for a larger LLM. The smaller LLM can digest a few pieces of evidence and generate multiple pieces of drafts in parallel rapidly, and these drafts will be verified by a large LLM to guarantee the quality. We achieve a higher speed as well as a better quality in the RAG results.
View details
ESAM++: Efficient Online 3D Perception on the Edge
Qin Liu
Lavisha Aggarwal
Vikas Bahirwani
Lin Li
Aleksander Holynski
Saptarashmi Bandyopadhyay
Zhengyang Shen
Marc Niethammer
Ehsan Adeli
Andrea Colaco
2025
Preview abstract
Online 3D scene perception in real time is critical for robotics, AR/VR, and autonomous systems, particularly in edge computing scenarios where computational resources are limited. Recent state-of-the-art methods like EmbodiedSAM (ESAM) demonstrate the promise of online 3D perception by leveraging the 2D visual foundation model (VFM) with efficient 3D query lifting and merging. However, ESAM depends on a computationally expensive sparse 3D U-Net for point cloud feature extraction, which we identify as the primary efficiency bottleneck. In this paper, we propose a lightweight and scalable alternative for online 3D scene perception tailored to edge devices. Our method introduces a 3D Sparse FeaturePyramid Network (SFPN) that efficiently captures multi-scale geometric features from streaming 3D point clouds while significantly reducing computational over-head and model size. We evaluate our approach on four challenging segmentation benchmarks—ScanNet, ScanNet200, SceneNN, and 3RScan—demonstrating that our model achieves competitive accuracy with up to 3×faster inference and 3×small model size compared to ESAM, enabling practical deployment in real-world edge scenarios. Code and models will be released.
View details
Reasoning-SQL: Reinforcement Learning with Partial Rewards for Reasoning-Enhanced Text-to-SQL
Mohammadreza Pourreza
Shayan Talaei
Hailong Li
Azalia Mirhoseini
Amin Saberi
Conference on Language Modeling (COLM) (2025) (to appear)
Preview abstract
Text-to-SQL is a challenging task involving multiple reasoning-intensive subtasks, including natural language understanding, database schema comprehension, and precise SQL query formulation. Existing approaches often rely on handcrafted reasoning paths with inductive biases that can limit their overall effectiveness. Motivated by the recent success of reasoning-enhanced models such as DeepSeek R1 and OpenAI o1, which effectively leverage reward-driven self-exploration to enhance reasoning capabilities and generalization, we propose a novel set of partial rewards tailored specifically for the Text-to-SQL task. Our reward set includes schema-linking, AI feedback, n-gram similarity, and syntax check, explicitly designed to address the reward sparsity issue prevalent in reinforcement learning (RL). Leveraging group relative policy optimization (GRPO), our approach explicitly encourages large language models (LLMs) to develop intrinsic reasoning skills necessary for accurate SQL query generation. With models of different sizes, we demonstrate that RL-only training with our proposed rewards consistently achieves higher accuracy and superior generalization compared to supervised fine-tuning (SFT). Remarkably, our RL-trained 14B-parameter model significantly outperforms larger proprietary models, e.g. o3-mini by 4% and Gemini-1.5-Pro-002 by 3% on the BIRD benchmark. These highlight the efficacy of our proposed RL-training framework with partial rewards for enhancing both accuracy and reasoning capabilities in Text-to-SQL tasks.
View details
Circadian rhythm of heart rate and activity: a cross-sectional study
Maryam Khalid
Logan Schneider
Aravind Natarajan
Conor Heneghan
Karla Gleichauf
Chronobiology International (2025)
Preview abstract
ABSTRACT
Background: Circadian rhythms are commonly observed in a number of physiological processes. Consumer wearable devices have made it possible to obtain continuous time series data from a large number of individuals. We study circadian rhythms from measurements of heart rate, movement, and sleep, from a cohort of nearly 20,000 participants over the course of 30 days.
Methods: Participation was restricted to Fitbit users of age 21 years or older residing in the United States or Canada. Participants were enrolled through a recruitment banner shown on the Fitbit App. The advertisement was shown to 531,359 Fitbit users, and 23,239 enrolled in the program. Of these, we obtained heart rate data from 19,350 participants. We obtain the underlying circadian rhythm from time series heart rate by modeling the circadian rhythm as a sum over the first two Fourier harmonics. The first Fourier harmonic accounts for the 24-hour rhythmicity, while the second harmonic accounts for non-sinusoidal perturbations.
Findings: We observe a circadian rhythm in both heart rate and acceleration. From the diurnal modulation, we obtain the following circadian parameters: (i) amplitude of modulation, (ii) bathyphase, (iii) acrophase, (iv) non-sinusoidal fraction, and (v) fraction of day when the heart rate is greater than the mean. The amplitude, bathyphase, and acrophase depend on sex, and decrease with age. The waketime on average, follows the bathyphase by 2.4 hours. In most individuals, the circadian rhythm of heart rate lags the circadian rhythm of activity.
Interpretation: Circadian metrics for heart rate and activity can be reliably obtained from commercially available wearable devices. Distributions of circadian metrics can be valuable tools for individual-level interpretation.
View details
Towards Conversational AI for Disease Management
Khaled Saab
David Stutz
Kavita Kulkarni
Sara Mahdavi
Joelle Barral
James Manyika
Ryutaro Tanno
Adam Rodman
arXiv (2025)
Preview abstract
While large language models (LLMs) have shown promise in diagnostic dialogue, their capabilities for effective management reasoning - including disease progression, therapeutic response, and safe medication prescription - remain under-explored. We advance the previously demonstrated diagnostic capabilities of the Articulate Medical Intelligence Explorer (AMIE) through a new LLM-based agentic system optimised for clinical management and dialogue, incorporating reasoning over the evolution of disease and multiple patient visit encounters, response to therapy, and professional competence in medication prescription. To ground its reasoning in authoritative clinical knowledge, AMIE leverages Gemini's long-context capabilities, combining in-context retrieval with structured reasoning to align its output with relevant and up-to-date clinical practice guidelines and drug formularies. In a randomized, blinded virtual Objective Structured Clinical Examination (OSCE) study, AMIE was compared to 21 primary care physicians (PCPs) across 100 multi-visit case scenarios designed to reflect UK NICE Guidance and BMJ Best Practice guidelines. AMIE was non-inferior to PCPs in management reasoning as assessed by specialist physicians and scored better in both preciseness of treatments and investigations, and in its alignment with and grounding of management plans in clinical guidelines. To benchmark medication reasoning, we developed RxQA, a multiple-choice question benchmark derived from two national drug formularies (US, UK) and validated by board-certified pharmacists. While AMIE and PCPs both benefited from the ability to access external drug information, AMIE outperformed PCPs on higher difficulty questions. While further research would be needed before real-world translation, AMIE's strong performance across evaluations marks a significant step towards conversational AI as a tool in disease management.
View details
Meet the teams driving innovation
Our teams advance the state of the art through research, systems engineering, and collaboration across Google.
See our teams