ComprehendEdit

About the project(paper included appendix is here)

We introduce ComprehendEdit, a comprehensive benchmark with enhanced metrics for multimodal knowledge editing. ComprehendEdit incorporates eight diverse tasks derived from multiple datasets, providing a more robust and varied evaluation framework. Two novel evaluation metrics are introduced: Knowledge Generalization Index (KGI) and Knowledge Preservation Index (KPI), which assess the impact of knowledge editing on in-domain samples. The variety in question types of existing datasets (generated by Llama-2-7b-chat-hf) and ComprehendEdit are shown in following table:

Task	E-VQA	VLKEB	ComprehendEdit
Object Recognition	4,854	8,089	2,962
Object Attributes	1,435	27	2,987
Object Counting	1,213	0	2,009
Object Existence	845	3	1,962
Scene Information	45	44	2,854
Numerical Inference	23	0	846
Spatial Relationship	16	1	2,239
Text Recognition	8	0	2,073
Total	8,439	8,164	17,932

ComprehendEdit focus on evaluate the edited model on in-domain samples, as shown in following figure:

Details of Dataset

Here are some samples of ComprehendEdit:

Q, G, P, S, C mean Question, Ground-truth, Prediction, Source, task Category independently.

The dataset is organized as follows:

|——ComprehendEdit/   
|  |——GQA/
|  |  |——images/
|  |  |  |——21.jpg
|  |  |  |——...
|  |——MathVista/
|  |  |——images/
|  |——TallyQA/
|  |  |——VG_100K/            
|  |  |——VG_100K_2/
|  |——TextVQA/
|  |  |——train_images/
|  |——VSR/
|  |  |——images/
|  |——val2014/
|——ComprehendEdit_train.json          
|——ComprehendEdit_test.json
|——ComprehendEdit_ori_right.json

The format of each sample in test set is

[{
"image": "GQA/images/2405722.jpg",
"question": "What is this bird called?",
"rephrase": "What is the bird's name?", # for Text-Generality
"answer": "parrot",
"source": "GQA",  
"Category": "object recognition",
"pid": 0,
"img_topk": [...],  # pid of the image topk nearest samples in test set
"txt_topk": [...],  # pid of the text topk nearest samples in test set
"img_last_topk": [...], # pid of the image topk farthest samples in test set
"txt_last_topk": [...], # pid of the text topk farthest samples in test set
"ori_rt_img_topk": [...], # pid of the image topk nearest samples in ComprehendEdit_ori_right.json
"ori_rt_txt_topk": [...], # pid of the text topk nearest samples in ComprehendEdit_ori_right.json
"ori_rt_img_last_topk": [...], # pid of the image topk farthest samples in ComprehendEdit_ori_right.json
"ori_rt_txt_last_topk": [...], # pid of the text topk farthest samples in ComprehendEdit_ori_right.json
"locality_prompt": "when does twice upon a time come out", # for Text-Locality
"locality_ground_truth": "...",
"multimodal_locality_image": "...", # for Multimodal-Locality
"multimodal_locality_prompt": "...",
"multimodal_locality_ground_truth": "..."}, ...]

The details of ComprehendEdit is shown in following table:

Task	Train	Test	Source
Object Recognition	1,471	491	GQA
Object Attributes	2,227	735	GQA
Object Counting	2,282	705	GQA
Object Existence	1,506	503	TallyQA
Scene Information	2,067	787	GQA
Numerical Inference	1,709	530	VSR
Spatial Relationship	1,554	519	TextVQA
Text Recognition	634	212	MathVista
Total	13,450	4,482

The ratio of training data to test data in each task is approximately 3:1, and we also utilize samples from the NQ dataset and OK-VQA dataset to measure text locality (T-L) and multimodal locality (M-L).

This dataset is collected from several benchmarks using BLIP-2 OPT 2.7B and MiniGPT-4 7B. We recommand measuring the changes on top-10 prediction on locality samples before and after editing if you want to run other models on ComprehendEdit. We will update the results in months.

Getting Started

The dataset can be downloaded from baiduyun or google driver. The project is built based on EasyEdit. The class ComprehendEdit is located in ComprehendEdit/easyeditor/dataset/ComprehendEdit.py, and you can import it just like E-VQA.

Usage

The conda environment is provided in EasyEdit multimodal knowledge editing, and the links of the pretrained model weights are provided in VLKEB.

To run the code, you can use the following command:

sh run_multi.sh # or python3 multimodal_edit_our.py

And you can change the algorithm name in multimodal_edit_our.py to run other models. For example,

train_HICE(model='blip2', train=True)

this code means we will train HICE based on BLIP-2 OPT 2.7B. After training, you can just change train=False to evaluate the model.

Besides, you can also change the hyperparameters yamls in ComprehendEdit/hparams. For example, your can change the ComprehendEdit/hparams/TRAINING/HICE/minigpt4.yaml to decide run the code on different gpus, change the path of pretrained model and so on. In yaml files, gpu_used_id and gpu_split are used to split the model to different gpus.

If you want to run experiments on one gpu, you can set model_parallel=False and gpu_split=[]. If you want to run experiments on other models, you can add the model setting in ComprehendEdit/easyeditor/util/tools.py to support the model. (using device_map="auto" simply may cause out-of-memory on the main gpu if the dataset is too large, running on too many gpus will waste the gpus and need more time.)

Thanks for the framework provided by EasyEdit! The samples in ComprehendEdit come from several datasets: GQA, TallyQA, VSR, TextVQA, MathVista, OKVQA, and NQ dataset. Part of the code references RanPAC. Thanks for these outstanding works!

Please cite our paper if you use ComprehendEdit in your work.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
clip		clip
easyeditor		easyeditor
figs		figs
hparams		hparams
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
edit.py		edit.py
multimodal_edit.py		multimodal_edit.py
multimodal_edit_our.py		multimodal_edit_our.py
run_multi.sh		run_multi.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ComprehendEdit

About the project(paper included appendix is here)

Details of Dataset

Getting Started

Usage

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

yaohui120/ComprehendEdit

Folders and files

Latest commit

History

Repository files navigation

ComprehendEdit

About the project(paper included appendix is here)

Details of Dataset

Getting Started

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages