HOME
ABOUT
- RESULTS
- differences
- BENEFITS
- HISTORY
- TEAM
- LOCATION
- FACILITIES
- BANKING
- MEMBERSHIPS
- APPROVALS
- LICENCES
- SUPPLIERS
- SPONSORSHIPS
- MEDIA
- PRIVACY
AUCTIONS
SHIPPING
FEES
- TS REWARDS
TOOLS
guides
FAQ
CONTACT
- CONNECT

VEHICLES
BRAND
- JAPANESE CARS
  - DAIHATSU
  - EUNOS
  - FORD
  - HONDA
  - ISUZU
  - LEXUS
  - MAZDA
  - MITSUBISHI
  - MITSUOKA
  - NISSAN
  - SUBARU
  - SUZUKI
  - TOYOTA
- GERMAN CARS
- AMERICAN CARS
- BRITISH CARS
- ITALIAN CARS
- FRENCH CARS
- SWEDISH CARS
- KOREAN CARS
TYPE
- mobility
- VENDING
- instruction
- TAXIS
- AMBULANCES
- FIRE ENGINES
- HEARSES
- LIMOUSINES
- COMMERCIAL
CLASS
FUEL
TRUCKS
minitrucks
- DAIHATSU
- HONDA
- MAZDA
- MITSUBISHI
- NISSAN
- SUBARU
- SUZUKI
- DUMP
- CRANE
- CAMPER
- REFRIGERATED
- 4WD
- NEW
BUSES
MOTORHOMES
- YAHOO!
- RAKUTEN
- DEALER

PARTS
- FREE REPORT
- PARTS CONTAINERS
- PARTS SYSTEMS
- PARTS PROTECTION
- BODY SHELLS
- DISMANTLING
- ONLINE PARTS
- NEW PARTS
- INTERIOR PARTS
- EXTERIOR PARTS
  - BONNETS
  - BUMPERS
  - GRILLES
  - FENDERS
  - DOORS
  - TRUNKS
  - SPOILERS
  - LIGHTS
  - EMBLEMS
  - CAMERAS
- ENGINES
- TRANSMISSIONS
- WHEELS & TYRES
  - WHEELS
  - TYRES
CUTS
PERFORMANCE PARTS
TRUCK PARTS
MOTORBIKE PARTS
- MOTORBIKE ENGINES
- MOTORBIKE ACCESSORIES

MOTORBIKES
MARINE
FORKLIFTS
MACHINERY
AGRICULTURAL
OTHER
COUNTRY
- AUSTRALIA
- CANADA
- KENYA
- MYANMAR
- NEW ZEALAND
- PAKISTAN
- TANZANIA
- UNITED STATES

CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Mon, 28 Apr 2025 08:03:25 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"680f364d-d4a9" expires: Mon, 29 Dec 2025 21:26:00 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 211F:2680BD:94AABC:A6D9BA:6952EF8E accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 21:16:00 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210052-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767042961.652090,VS0,VE205 vary: Accept-Encoding x-fastly-request-id: fd8d50afb658b1c984edb0e09fd5e39bd0703230 content-length: 8872 MATHVERSE: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

More Research

MathVista

🔥

LLaMA-Adapter (V2)

🔥

SPHINX-X

🔥

ImageBind-LLM SPHINX Point-Bind & Point-LLM PerSAM MathCoder CSV

MathVerse

Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Renrui Zhang*¹^,², Dongzhi Jiang*¹, Yichi Zhang*², Haokun Lin², Ziyu Guo²^,³, Pengshuo Qiu², Aojun Zhou¹, Pan Lu⁴, Kai-Wei Chang⁴ Peng Gao² Hongsheng Li¹

¹CUHK MMLab, ²Shanghai AI Laboratory, ³CUHK MiuLar Lab,
⁴University of California, Los Angeles

Paper arXiv Code

🤗

Dataset

🔮

Visualize

🏆

Leaderboard

Introduction

The remarkable progress of Multi-modal Large Language Models (MLLMs) has garnered unparalleled attention, due to their superior performance in visual contexts. However, their capabilities in visual math problem-solving remain insufficiently evaluated and understood.We investigate current benchmarks to incorporate excessive visual content within textual questions, which potentially assist MLLMs in deducing answers without truly interpreting the input diagrams.

To this end, we introduce Logo MathVerse, an all-around visual math benchmark designed for an equitable and in-depth evaluation of MLLMs. We meticulously collect 2,612 high-quality, multi-subject math problems with diagrams from publicly available sources. Each problem is then transformed by human annotators into 6 distinct versions, each offering varying degrees of information content in multi-modality, contributing to 15K test samples in total. This approach allows Logo MathVerse to comprehensively assess whether and how much MLLMs can truly understand the visual diagrams for mathematical reasoning. In addition, we propose a Chain-of-Thought (CoT) evaluation strategy for a fine-grained assessment of the output answers. Rather than naively judging True or False, we employ GPT-4(V) to adaptively extract crucial reasoning steps, and then score each step with detailed error analysis, which can reveal the intermediate CoT reasoning quality by MLLMs.

With Logo MathVerse, we unveil that, most existing MLLMs struggle to understand math diagrams, relying heavily on textual questions. Surprisingly, some of them even achieve 5%+ higher accuracy without the visual input, e.g., Qwen-VL-Max and InternLM-XComposer2. In contrast, GPT-4V and ShareGPT4V demonstrate relatively better comprehension of the visual content for mathematical reasoning. We hope the Logo MathVerse benchmark may provide unique insights to guide the future development of MLLMs.

Leaderboard

Accuracy scores on the testmini subset of Logo MathVerse.

#	Model	Method	Source	Date	ALL		Text Dominant		Text Lite		Text Only		Vision Intensive		Vision Dominant		Vision Only
#	Model	Method	Source	Date	CoT-E	w/o	CoT-E	w/o	CoT-E	w/o	CoT-E	w/o	CoT-E	w/o	CoT-E	w/o	CoT-E	w/o
1	VL-Rethinker-72B	MLLM 🖼️	Link	2025-04-10	-	61.7	-	-	-	-	-	-	-	-	-	-	-	-
2	VL-Rethinker-7B	MLLM 🖼️	Link	2025-04-10	-	54.2	-	-	-	-	-	-	-	-	-	-	-	-
3	GPT-4V	MLLM 🖼️	Link	2023-12-26	54.4	39.4	63.1	54.7	56.6	41.4	60.3	48.7	51.4	34.9	50.8	34.4	50.3	31.6
4	Qwen-VL-Max	MLLM 🖼️	Link	2023-12-26	37.2	25.3	42.8	30.7	37.7	26.1	47.9	28.9	33.6	24.1	35.9	24.1	35.9	21.4
5	LLaVA-NeXT-34B	MLLM 🖼️	Link	2024-01-30	34.6	23.8	49.0	33.8	37.6	25.5	30.1	21.3	35.2	23.5	28.9	20.3	22.4	15.7
6	Gemini-Pro	MLLM 🖼️	Link	2023-12-26	35.3	23.5	39.8	26.3	34.7	23.5	44.5	27.3	32.0	23.0	36.8	22.3	33.3	22.2
7	G-LLaVA-13B	MLLM 🖼️	Link	2023-12-26	16.4	16.9	23.6	21.2	18.9	20.9	22.6	22.3	17.1	17.7	13.5	14.8	8.9	10.0
8	G-LLaVA-7B	MLLM 🖼️	Link	2023-12-26	15.7	16.6	22.2	20.9	20.4	20.7	21.6	21.1	16.5	17.2	12.7	14.6	6.6	9.4
9	InternLM-XComposer2-VL-7B	MLLM 🖼️	Link	2024-01-22	25.9	16.5	36.9	22.3	28.3	17.0	42.5	16.5	20.1	15.7	24.4	16.4	19.8	11.0
10	LLaVA-NeXT-13B	MLLM 🖼️	Link	2024-01-13	17.2	15.6	21.6	19.4	19.7	15.2	25.1	18.1	17.6	16.8	14.9	15.2	12.1	11.3
11	SPHINX-MoE	MoE 🤖	Link	2023-10-03	22.8	15.0	33.3	22.2	21.9	16.4	40.7	18.3	21.1	14.8	19.6	12.6	18.3	9.1
12	ShareGPT4V-13B	MLLM 🖼️	Link	2023-12-26	17.4	13.1	21.8	16.2	20.6	16.2	14.6	6.6	18.6	15.5	16.2	13.8	9.7	3.7
13	SPHINX-Plus	MLLM 🖼️	Link	2023-12-26	14.0	12.2	16.3	13.9	12.8	11.6	15.8	14.9	12.9	11.6	14.7	13.5	13.2	10.4
14	Qwen-VL-Plus	MLLM 🖼️	Link	2023-12-26	21.3	11.8	26.0	15.7	21.2	11.1	25.2	14.5	18.5	9.0	19.1	13.0	21.8	10.0
15	MiniGPT-v2-7B	MLLM 🖼️	Link	2023-12-26	10.9	11.0	13.2	12.1	12.7	12.0	15.3	11.7	11.1	13.1	11.3	10.3	6.4	7.4
16	ImageBind-LLM	MLLM 🖼️	Link	2023-12-26	10.0	9.2	13.2	11.4	11.6	11.3	12.9	11.7	9.8	8.9	11.8	11.2	3.5	3.4
17	LLaVA-1.5-13B	MLLM 🖼️	Link	2023-12-26	12.7	7.6	17.1	8.8	12.0	7.6	22.6	11.5	12.6	7.4	12.7	7.4	9.0	6.9
18	mPLUG-Owl2-7B	MLLM 🖼️	Link	2023-12-26	10.3	5.9	11.6	6.6	11.4	6.3	13.8	6.1	11.1	6.3	9.4	5.6	8.0	4.9
19	LLaMA-Adapter V2	MLLM 🖼️	Link	2023-12-26	5.8	5.7	7.8	6.2	6.3	5.9	3.9	2.7	6.2	6.1	4.5	4.2	4.4	6.1
-	ChatGPT	LLM 📄	Link	2023-10-03	-	-	51.3	33.3	38.5	18.9	51.3	33.3	-	-	-	-	-	-
-	GPT-4	LLM 📄	Link	2023-10-03	-	-	63.4	46.5	40.7	20.7	63.4	46.5	-	-	-	-	-	-
-	Human Performance*	-	Link	2023-10-03	-	64.9	-	71.2	-	70.9	-	41.7	-	61.4	-	68.3	-	66.7
-	Random Chance	-	Link	2023-10-03	-	12.4	-	12.4	-	12.4	-	12.4	-	12.4	-	12.4	-	12.4

Human Performance*: Average human performance from annotators that are college students.
Method types: MLLM 🖼️: Multi-modal Large Language Model, MoE 🤖: Mixture of Experts, LLM 📄: Large Language Model

MathVerse Dataset

Overview

Logo MathVerse is a holistic and specialized visual math benchmark crafted to evaluate the multi-modal mathematical reasoning skills of MLLMs. This benchmark encompasses a meticulously collected dataset of 2,612 visual math problems, with 1,236 newly acquired from public question repositories and 1,376 selected from existing benchmarks, ensuring a diverse range of challenges. To specialize in mathematical reasoning, Logo MathVerse spans three primary areas: plane geometry, solid geometry, and functions. Each problem has been rigorously reviewed by expert annotators and classified into 12 detailed categories, emphasizing different fine-grained problem-solving capabilities. Notably, Logo MathVerse distinguishes itself by introducing 2 novel strategies for evaluating MLLMs.

Examples of six versions of each problem in MathVerse.

Three categories of question texts in MathVerse.

You can download the dataset on Hugging Face Dataset.

Key statistics of Logo MathVerse.

Subject distribution of Logo MathVerse.
Solid G: Solid Geometry, Plane G: Plane Geometry,

Examples

One example for each subfield in Logo MathVerse

Plane Geometry

Solid Geometry

Functions

Comparison of six problem versions in Logo MathVerse

Visualization

Experiment Results

Results on Existing Foundation Models

Distribution of GPT-4V's errors in reasoning and answers.

Distribution of GPT-4V's errors within different types.

Visualization Examples

Response Comparison of GPT-4V, LLaVA-NeXT, and SPHINX-MoE. We adopt the Text-lite version of the problem, and highlight the key-step extraction and scoring by the CoT evaluation strategy.