Table of Contents
We construct a new Large Vision-Language Model Knowledge Editing Benchmark, VLKEB, and extend the Portability metric for more comprehensive evaluation. Leveraging a multi-modal knowledge graph, our image data are bound with knowledge entities. This can be further used to extract entity-related knowledge, which constitutes the base of editing data.
Dataset is available at Kaggle. You can download it from site or use kaggle api:
kaggle datasets download -d hymanh/vlkeb-dataWe also provide a Hugging Face dataset as an alternative.
The dataset is organized as follows:
βββ VLKEB/
β βββ VLKEB_images/ # image folder
β β βββ m.0104lr/ # image subfolder, entity ID
β β β βββ google_15.jpg # image file
β β β βββ ...
β β βββ ...
β β
β βββ train.json # Train file
β βββ eval.json # Evaluation file, without portability test
β βββ eval_multihop.json # Evaluation file, containing multi-hop portability
β βββ eval_edit_onehop.json # Evaluation file, edit one-hop knowledge for portability
β β
β βββ LICENSE.txt # License fileVLKEB includes a total of 8174 edits, divided into 5000 for training and 3174 for evaluation. There are 18434 images used in the Reliability, Generality, and Locality tests. The Portability test utilizes the same images as the Reliability test and comprises a total of 4819 cases. These cases are distributed among 1-hop, 2-hop, 3-hop, and 4-hop categories, with 1278, 1238, 1193, and 1110 cases, respectively.
| All (train/eval) | Rel. | Gen. | Loc. | ||
|---|---|---|---|---|---|
| #Edits | 8174 (5000/3174) | #Images | 8172 | 6627 | 3635 |
| All (eval only) | 1-hop | 2-hop | 3-hop | 4-hop | |
| #Port. | 4819 | 1278 | 1238 | 1193 | 1110 |
Conda environment: we export the conda environment file for running the code. Please ensure you carefully review the separate environments provided for different algorithms and models. We conduct experiments based on the great works in Acknowledgments.
# To run the code for FT, IKE, MEND and SERAC on models blip2, minigpt4 and llava, use the following environment
conda env create -f envs/vlkeb_easyedit.yml
# To run the code for FT, IKE, MEND and SERAC on model qwen-vl, use the following environment
conda env create -f envs/vlkeb_qwenvl.yml
# To run the code for FT, IKE, MEND and SERAC on model owl-2, use the following environment
conda env create -f envs/vlkeb_owl2.yml
# To run the code for KE, use the following environment
conda env create -f envs/vlkeb_ke.ymlWe provide pre-trained models for SERAC, MEND and KE in the paper.
The weights can be downloaded from Hugging Face or use the following command.
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/HymanH/VLKEB-modelsTo run the code, we also need to download the pre-trained pytorch models of LVLMs and others, then put them in proper directories.
Here we put under 'hugging_cache' folder and 'openai' folder:
# models in hugging_cache folder
hugging_cache/
βββ all-MiniLM-L6-v2/
βββ bert-base-uncased/
βββ distilbert-base-cased/
βββ Llama-2-7b-hf/
βββ llava-v1.5-7b/
βββ mplug-owl2-llama2-7b/
βββ opt-2.7b/
βββ opt-125m/
βββ Qwen-7B/
βββ Qwen-VL/
βββ vicuna-7b/
βββ vicuna-7b-v1.5/
β
βββ blip2_pretrained_flant5xxl.pth
βββ blip2_pretrained_opt2.7b.pth
βββ eva_vit_g.pth
βββ pretrained_minigpt4_7b.pth
# clip-vit model in openai folder
openai/
βββ clip-vit-large-patch14-336/Links are in the following:
Currently, we put code of different experiments in different branches.
For the single editing experiment, you can refer to the main branch. For the multihop and sequential editing experiment, you can refer to the multihop_and_sequential branch. For the edit one-hop knowledge, you can refer to the edit_onehop branch.
For experiments of KE method, you can refer to the main branch and get into 'KE' subfolder.
The parameters are all in hparams folder, and detailed setting can be found in EasyEdit. Path to models and data should be properly set in config files.
To run the code, check the python file under root folder and run as the following:
# at main branch
python multimodal_edit.py [FUNC_NAME] [HOP_NUM] # see .py file for function names
# at main branch, KE, can use bash scripts
./train_ke.sh [GPU_ID] [MODEL_NAME] # MODEL_NAME=[blip2, minigpt4, llava, qwen-vl, owl-2]
./test_ke.sh [GPU_ID] [MODEL_NAME] [CHECKPOINT_PATH] # test without portability
./test_multihop.sh [GPU_ID] [MODEL_NAME] [HOP_NUM] # HOP_NUM=[1, 2, 3, 4]
# at multihop_and_sequential branch
python test_base_portability.py [FUNC_NAME] [HOP_NUM] # test portability on unedited models
python test_multihop_portability.py [FUNC_NAME] [HOP_NUM]
python test_sequential_editing.py [FUNC_NAME] # hop num is 1
# at edit_onehop branch
python test_edit_onehop.py [FUNC_NAME]If you find our project or dataset helpful to your research, please consider citing:
@misc{huang2024vlkeb,
title={VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark},
author={Han Huang and Haitian Zhong and Tao Yu and Qiang Liu and Shu Wu and Liang Wang and Tieniu Tan},
year={2024},
eprint={2403.07350},
archivePrefix={arXiv}
}
Github (seen by all contributors) - New Issue
Han Huang - han.huang@cripac.ia.ac.cn
Haitian Zhong - haitian.zhong@cripac.ia.ac.cn
We would like to thank the following projects and their great works for making this project possible: MMKG, EasyEdit, KnowledgeEditor, LAVIS (BLIP2), MiniGPT-4, LLaVA, Qwen-VL, mPLUG-Owl2.
We would also like to extend our gratitude to all the other projects and contributors in the open-source community whose work may not be directly listed here but has nonetheless been invaluable. Your innovations, tools, and libraries have greatly contributed to our project. We are immensely grateful for your work!
