JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images

Overview

Welcome to the official repository of our NeurIPS 2024 Datasets and Benchmarks Track Submission, JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images. This repository contains code, models, evaluation metrics, and information related to our dataset and research paper.

Dataset Description

JourneyBench is a comprehensive dataset designed to rigorously assess the fine-grained multimodal reasoning abilities of state-of-the-art models using challenging, human-annotated, and generated images. The dataset includes tasks such as Multimodal Chain-of-Thought (MCOT), Multi-image VQA, Imaginary Image Captioning, VQA with Hallucination Triggers, and Fine-Grained Cross-Modal Retrieval with sample-specific distractors. JourneyBench fills the gap in existing benchmarks by presenting complex reasoning challenges in unusual and fictional visual contexts.

Structure

base-models: Implementation code for base models.
evaluation: Implementation code for Multimodal Chain-of-Thought, Multi-image VQA, HaloQuest and Imaginary Caption Generation evaluation metrics.
automatic-qa-generator: Implementation code of Human-Machine-in-the-Loop for generating initial sample-specific text distractors.
midjourney-scrapper: Implementation code for collecting Midjourney images.

Evaluation

Inside the folder evaluation, the eval_metrics.py file contains evaluation code for both VQA v2 and conventional metrics such as BLEU, CIDER, ROUGE, and METEOR.

Automatic Question-Answer Data Generation

Inside the folder automatic-qa-generator, we utilize the Machine-Human-in-the-Loop approach in our work to employ LLM and VLMs to generate a portion of our initial question-answer pair data. The framework is implemented following IdealGPT.

JourneyBench Data

TBD

Midjourney Image Scraping

Inside the folder midjourney-scrapper, the scrapper.py file downloads both top-voted and trending images from the publicly visible gallery, requiring no login or session token. The images will be stored in a new folder with today's date in the form YYYYMMDD.

License

Contributions

Zhecan Wang^♠, Junzhang Liu^♠, Chia-Wei Tang^†, Hani Alomari^†, Anushka Sivakumar^†, Rui Sun^♠, Wenhao Li^♠, Md. Atabuzzaman^†, Hammad Ayyubi^♠, Haoxuan You^♠, Alvi Ishmam^†, Kai-Wei Chang^♦, Shih-Fu Chang^♠, Chris Thomas^†

^♠ Columbia University, ^♦ UCLA, ^† Virginia Tech

Contact

For any inquiries, please contact us at journeybench.contact@gmail.com.

Thank you for your interest and patience. Please subscribe to our mailing list and stay tuned for updates!

ToDo List

Project Page
Open-source the JourneyBench dataset
Implement and share evaluation metrics
Develop and maintain a leaderboard for model performance
Host a workshop and competition at the upcoming CVPR conference
Extend the dataset with new instances and tasks

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
automatic-qa-generator		automatic-qa-generator
base-models		base-models
evaluation		evaluation
midjourney-scrapper		midjourney-scrapper
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images

Overview

Dataset Description

Structure

Evaluation

Automatic Question-Answer Data Generation

JourneyBench Data

Midjourney Image Scraping

License

Contributions

Contact

ToDo List

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

JourneyBench/JourneyBench

Folders and files

Latest commit

History

Repository files navigation

JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images

Overview

Dataset Description

Structure

Evaluation

Automatic Question-Answer Data Generation

JourneyBench Data

Midjourney Image Scraping

License

Contributions

Contact

ToDo List

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages