CARVIEW

MOTORHOMES

Select Language

HTTP/2 301 server: GitHub.com content-type: text/html location: https://google.github.io/dsg/ x-github-request-id: A906:1387E:833C67:93A21A:69520196 accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 04:20:39 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210070-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766982039.084729,VS0,VE202 vary: Accept-Encoding x-fastly-request-id: 30c91faa53549ef40719f502a560d5defd5ffdee content-length: 162 HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Tue, 16 Jan 2024 13:40:11 GMT access-control-allow-origin: * etag: W/"65a6873b-2880" expires: Mon, 29 Dec 2025 04:30:39 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: E589:234FE9:84FAC8:956099:69520196 accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 04:20:39 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210070-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766982039.305643,VS0,VE213 vary: Accept-Encoding x-fastly-request-id: 22d8cc3defd46780ac2b23d4498f8815d613e34c content-length: 2833 Davidsonian Scene Graph: Improving Reliability in Fine-Grained Evaluation for Text-to-Image Generation

Davidsonian Scene Graph: Improving Reliability in Fine-Grained Evaluation for Text-to-Image Generation

Jaemin Cho
UNC Chapel Hill

Yushi Hu
UWashington

Roopal Garg
Google Research

Peter Anderson
Google Research

Ranjay Krishna
UWashington

Jason Baldridge
Google Research

Mohit Bansal
UNC Chapel Hill

Jordi Pont-Tuset
Google Research

Su Wang
Google Research

PDF Code DSG-1k

Abstract

Evaluating text-to-image models is notoriously difficult. A strong recent approach for assessing text-image faithfulness is based on QG/A (question generation and answering), which uses pre-trained foundational models to automatically generate a set of questions and answers from the prompt, and output images are scored based on whether these answers extracted with a visual question answering model are consistent with the prompt-based answers. This kind of evaluation is naturally dependent on the quality of the underlying QG and QA models. We identify and address several reliability challenges in existing QG/A work: (a) QG questions should respect the prompt (avoiding hallucinations, duplications, and omissions) and (b) VQA answers should be consistent (not asserting that there is no motorcycle in an image while also claiming the motorcycle is blue). We address these issues with Davidsonian Scene Graph (DSG), an empirically grounded evaluation framework inspired by formal semantics. DSG is an automatic, graph-based QG/A that is modularly implemented to be adaptable to any QG/A module. DSG produces atomic and unique questions organized in dependency graphs, which (i) ensure appropriate semantic coverage and (ii) sidestep inconsistent answers. With extensive experimentation and human evaluation on a range of model configurations (LLM, VQA, and T2I), we empirically demonstrate that DSG addresses the challenges noted above. Finally, we present DSG-1k, an open-sourced evaluation benchmark that includes 1,060 prompts, covering a wide range of fine-grained semantic categories with a balanced distribution. We will release the DSG-1k prompts and the corresponding DSG questions.

QG/A: New Paradigm in T2I Alignment Eval

Reliability Issues in Existing QG/A Methods

DSG Solution to the Reliability Issues

Publication

Davidsonian Scene Graph: Improving Reliability in Fine-Grained Evaluation for Text-to-Image Generation
Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna,
Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang
ICLR 2024
[PDF] [BibTeX]

@inproceedings{JaeminCho2024,
author        = {Jaemin Cho and Yushi Hu and Roopal Garg and Peter Anderson and Ranjay Krishna and Jason Baldridge and Mohit Bansal and Jordi Pont-Tuset and Su Wang},
title         = {{Davidsonian Scene Graph: Improving Reliability in Fine-Grained Evaluation for Text-to-Image Generation}},
booktitle     = {ICLR},
year          = {2024}
}

Copy to clipboard

Original Source | Taken Source