This repository contains scripts and tools for generating and evaluating datasets.
The main script for generating datasets is: generate_dataset_with_qa.py
The main script for getting answers from LVLMs in json format is run_dataset_dqa.py.
For evaluation you can check out the Notebook: notebooks/Evaluation DQA if you're feeling adventurous.
cd dataset_generation_with_options
python generate_dataset_with_qa.py icon_dataset_relationship_directionless_1000 1000 relationship_directionlesspython -m generate_dataset_with_qa.py: Executes thegenerate_dataset_with_qamodule."icon_dataset_relationship_directionless_1000": The name of the dataset to evaluate, this is how it'll be saved.1000: The size of the datasetrelationship_directionless: The key to the diagram type. For more information you can explore the code. The options are [image, text, abs_position, rel_position, relationship_directionless] right now. Image here means icon setting.
First look into the constants.py file and add the necessary diagram folder path and informations.
For GPT models: Add 'GPT_API_KEY' in your environment. For Gemini models: You need to login to your account. To run the dataset question answering, use the following command:
python -m run_dataset_dqa "icon_dqa_relationship_directionless" "gpt-4o" --log_path "gpt4o_outputs/icon_dqa/gpt4o_relationship_directionless_evaluation.json" --cot --fewshotpython -m run_dataset_dqa: Executes therun_dataset_dqamodule."icon_dqa_relationship_directionless": The key of the dataset to evaluate, this key is looked up atconstants.py"gpt-4o": The model to use for evaluation. Can be picked from ["gpt-4o","gpt-4-vision-preview","gemini-1.5-pro"]--log_path "gpt4o_outputs/icon_dqa/gpt4o_relationship_directionless_evaluation_cot.json": Specifies the path where the log file will be saved, you can say whatever you want here make sure you don't overwrite--cot: Use this flag if you want COT.--fewshot: Use this flag if you want ICL. Make sure the dataset has a fewshot composite image specified inconstants.py
I recommend checking out the Notebook: notebooks/Evaluation DQA but if you want to get results seperately:
cd evaluation
python -m dqa_evaluation 'path' --q_component entity_abstract --q_type existpython -m dqa_evaluation: Executes thedqa_evaluationmodule."path": Path of the output json file of LVLM."--q_component entity_abstract": The specific component you want to evaluate can be specified (For Foodwebs)"--q_type count": The specific type of question you want to evaluate (For Synthetic Dataset) can be picked from [count, existence]
Ensure you have the following dependencies installed:
- Python 3.x (Python 3.11.5 for my case)
pip install -r requirements.txt