| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Fri, 19 Sep 2025 09:05:41 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"68cd1ce5-bdba"
expires: Sun, 28 Dec 2025 23:11:13 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: EFDC:3FD64F:80F6E8:90CDF8:6951B6B8
accept-ranges: bytes
age: 0
date: Sun, 28 Dec 2025 23:01:13 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210058-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766962873.914474,VS0,VE204
vary: Accept-Encoding
x-fastly-request-id: 46358fc01cbeb379ca960a5277901434a7b79bf5
content-length: 13101
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
†Project lead,
‡Correspondence Authors
1Shanghai Jiao Tong University,
2StepFun
- [2025-09-19] 🌟 OneIG-Bench has been accepted to NeurIPS 2025 DB Track.
- [2025-09-19] 🌟 We updated the evaulation results of Seedream 4.0, Gemini-2.5-Flash-Image(Nano Banana), Step-3o Vision and HunyuanImage-2.1.
- [2025-09-19] 🌟 We updated the official evaluation results of NextStep-1, Lumina-DiMOO and IRG.
- [2025-08-13] 🌟 We updated the official evaluation results of Qwen-Image.
- [2025-07-03] 🌟 We updated the evaluation results of Ovis-U1.
- [2025-06-25] 🌟 We updated the evaluation results of Show-o2 and OmniGen2.
- [2025-06-23] 🌟 We released the T2I generation script here.
- [2025-06-10] 🌟 We released the OneIG-Bench benchmark on 🤗 Hugging Face.
- [2025-06-10] 🌟 We released the tech report and the project page.
- [2025-06-10] 🌟 We released the evaluation scripts.
OneIG-Bench is a comprehensive benchmark framework for fine-grained evaluation of T2I models across multiple dimensions.
Note: If you would like to submit your results, please contact us at oneigbench@outlook.com.
Introduction
Text-to-image (T2I) models have garnered significant attention for generating high-quality images aligned with text prompts. However, rapid T2I model advancements reveal limitations in early benchmarks, lacking comprehensive evaluations, for example, the evaluation on reasoning, text rendering and style. Notably, recent state-of-the-art models, with their rich knowledge modeling capabilities, show promising results on the image generation problems requiring strong reasoning ability, yet existing evaluation systems have not adequately addressed this frontier. To systematically address these gaps, we introduce OneIG-Bench, a meticulously designed comprehensive benchmark framework for fine-grained evaluation of T2I models across multiple dimensions, including prompt-image alignment, text rendering precision, reasoning-generated content, stylization, and diversity. By structuring the evaluation, this benchmark enables in-depth analysis of model performance, helping researchers and practitioners pinpoint strengths and bottlenecks in the full pipeline of image generation. Specifically, OneIG-Bench enables flexible evaluation by allowing users to focus on a particular evaluation subset. Instead of generating images for the entire set of prompts, users can generate images only for the prompts associated with the selected dimension and complete the corresponding evaluation accordingly.
Benchmark Comparison
Unless otherwise specified, OneIG-Bench refers to OneIG-Bench-EN in the subsequent results sections.
Method Comparison Radar Charts on OneIG-Bench
Leaderboard (Methods & Scores for Each Metrics)
BibTeX
@article{chang2025oneig,
title={OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation},
author={Jingjing Chang and Yixiao Fang and Peng Xing and Shuhan Wu and Wei Cheng and Rui Wang and Xianfang Zeng and Gang Yu and Hai-Bao Chen},
journal={arXiv preprint arxiv:2506.07977},
year={2025}
}