Dataset DQA Evaluation

This repository contains scripts and tools for generating and evaluating datasets.

The main script for generating datasets is: generate_dataset_with_qa.py

The main script for getting answers from LVLMs in json format is run_dataset_dqa.py.

For evaluation you can check out the Notebook: notebooks/Evaluation DQA if you're feeling adventurous.

Generating the Dataset

cd dataset_generation_with_options
python generate_dataset_with_qa.py icon_dataset_relationship_directionless_1000 1000 relationship_directionless

Command Breakdown

python -m generate_dataset_with_qa.py: Executes the generate_dataset_with_qa module.
"icon_dataset_relationship_directionless_1000": The name of the dataset to evaluate, this is how it'll be saved.
1000: The size of the dataset
relationship_directionless: The key to the diagram type. For more information you can explore the code. The options are [image, text, abs_position, rel_position, relationship_directionless] right now. Image here means icon setting.

Generating LVLM outputs

First look into the constants.py file and add the necessary diagram folder path and informations.

For GPT models: Add 'GPT_API_KEY' in your environment. For Gemini models: You need to login to your account. To run the dataset question answering, use the following command:

python -m run_dataset_dqa "icon_dqa_relationship_directionless" "gpt-4o" --log_path "gpt4o_outputs/icon_dqa/gpt4o_relationship_directionless_evaluation.json" --cot --fewshot

Command Breakdown

python -m run_dataset_dqa: Executes the run_dataset_dqa module.
"icon_dqa_relationship_directionless": The key of the dataset to evaluate, this key is looked up at constants.py
"gpt-4o": The model to use for evaluation. Can be picked from ["gpt-4o", "gpt-4-vision-preview", "gemini-1.5-pro"]
--log_path "gpt4o_outputs/icon_dqa/gpt4o_relationship_directionless_evaluation_cot.json": Specifies the path where the log file will be saved, you can say whatever you want here make sure you don't overwrite
--cot: Use this flag if you want COT.
--fewshot: Use this flag if you want ICL. Make sure the dataset has a fewshot composite image specified in constants.py

Evaluating the outputs

I recommend checking out the Notebook: notebooks/Evaluation DQA but if you want to get results seperately:

cd evaluation
python -m dqa_evaluation 'path' --q_component entity_abstract --q_type exist

Command Breakdown

python -m dqa_evaluation: Executes the dqa_evaluation module.
"path": Path of the output json file of LVLM.
"--q_component entity_abstract": The specific component you want to evaluate can be specified (For Foodwebs)
"--q_type count": The specific type of question you want to evaluate (For Synthetic Dataset) can be picked from [count, existence]

Prerequisites

Ensure you have the following dependencies installed:

Python 3.x (Python 3.11.5 for my case)

pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
amt_annotation		amt_annotation
data_preprocessing		data_preprocessing
dataset_generation_with_options		dataset_generation_with_options
evaluation		evaluation
models		models
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
constants.py		constants.py
requirements.txt		requirements.txt
run_dataset_dqa.py		run_dataset_dqa.py
run_identification.py		run_identification.py
run_identification_object_robustness_prompt_v1.bat		run_identification_object_robustness_prompt_v1.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dataset DQA Evaluation

Generating the Dataset

Command Breakdown

Generating LVLM outputs

Command Breakdown

Evaluating the outputs

Command Breakdown

Prerequisites

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

buseg/diagram-understanding

Folders and files

Latest commit

History

Repository files navigation

Dataset DQA Evaluation

Generating the Dataset

Command Breakdown

Generating LVLM outputs

Command Breakdown

Evaluating the outputs

Command Breakdown

Prerequisites

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages