Carview!

VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark

Table of Contents

🛠️ About This Project
🚀 Getting Started
🧪 Usage
📖 Citation
📧 Contact
🎉 Acknowledgments

🛠️ About This Project

We construct a new Large Vision-Language Model Knowledge Editing Benchmark, VLKEB, and extend the Portability metric for more comprehensive evaluation. Leveraging a multi-modal knowledge graph, our image data are bound with knowledge entities. This can be further used to extract entity-related knowledge, which constitutes the base of editing data.

(back to top)

🚀 Getting Started

Download Data

Dataset is available at Kaggle. You can download it from site or use kaggle api:

kaggle datasets download -d hymanh/vlkeb-data

We also provide a Hugging Face dataset as an alternative.

The dataset is organized as follows:

├── VLKEB/
│   ├── VLKEB_images/           # image folder
│   │   ├── m.0104lr/           # image subfolder, entity ID
│   │   │   ├── google_15.jpg   # image file
│   │   │   ├── ...
│   │   ├── ...
│   │      
│   ├── train.json              # Train file
│   ├── eval.json               # Evaluation file, without portability test
│   ├── eval_multihop.json      # Evaluation file, containing multi-hop portability
│   ├── eval_edit_onehop.json   # Evaluation file, edit one-hop knowledge for portability
│   │
│   └── LICENSE.txt             # License file

VLKEB includes a total of 8174 edits, divided into 5000 for training and 3174 for evaluation. There are 18434 images used in the Reliability, Generality, and Locality tests. The Portability test utilizes the same images as the Reliability test and comprises a total of 4819 cases. These cases are distributed among 1-hop, 2-hop, 3-hop, and 4-hop categories, with 1278, 1238, 1193, and 1110 cases, respectively.

	All (train/eval)		Rel.	Gen.	Loc.
#Edits	8174 (5000/3174)	#Images	8172	6627	3635
	All (eval only)	1-hop	2-hop	3-hop	4-hop
#Port.	4819	1278	1238	1193	1110

Environments

Conda environment: we export the conda environment file for running the code. Please ensure you carefully review the separate environments provided for different algorithms and models. We conduct experiments based on the great works in Acknowledgments.

# To run the code for FT, IKE, MEND and SERAC on models blip2, minigpt4 and llava, use the following environment
conda env create -f envs/vlkeb_easyedit.yml
# To run the code for FT, IKE, MEND and SERAC on model qwen-vl, use the following environment
conda env create -f envs/vlkeb_qwenvl.yml
# To run the code for FT, IKE, MEND and SERAC on model owl-2, use the following environment
conda env create -f envs/vlkeb_owl2.yml
# To run the code for KE, use the following environment
conda env create -f envs/vlkeb_ke.yml

Pre-trained Models

We provide pre-trained models for SERAC, MEND and KE in the paper.

The weights can be downloaded from Hugging Face or use the following command.

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/HymanH/VLKEB-models

To run the code, we also need to download the pre-trained pytorch models of LVLMs and others, then put them in proper directories.

Here we put under 'hugging_cache' folder and 'openai' folder:

# models in hugging_cache folder
hugging_cache/
├── all-MiniLM-L6-v2/
├── bert-base-uncased/
├── distilbert-base-cased/
├── Llama-2-7b-hf/
├── llava-v1.5-7b/
├── mplug-owl2-llama2-7b/
├── opt-2.7b/
├── opt-125m/
├── Qwen-7B/
├── Qwen-VL/
├── vicuna-7b/
├── vicuna-7b-v1.5/
│   
├── blip2_pretrained_flant5xxl.pth
├── blip2_pretrained_opt2.7b.pth
├── eva_vit_g.pth
└── pretrained_minigpt4_7b.pth
# clip-vit model in openai folder
openai/
└── clip-vit-large-patch14-336/

Links are in the following:

all-MiniLM-L6-v2	bert-base-uncased	distilbert-base-cased
llava-v1.5-7b	opt-2.7b	opt-125m
Qwen-7B	Qwen-VL	vicuna-7b
vicuna-7b-v1.5	Llama-2-7b-hf	mplug-owl2-llama2-7b
blip2_pretrained_flant5xxl.pth	blip2_pretrained_opt2.7b.pth	prerained_minigpt4_7b.pth
eva_vit_g.pth	clip-vit-large-patch14-336

(back to top)

🧪 Usage

Currently, we put code of different experiments in different branches.

For the single editing experiment, you can refer to the main branch. For the multihop and sequential editing experiment, you can refer to the multihop_and_sequential branch. For the edit one-hop knowledge, you can refer to the edit_onehop branch.

For experiments of KE method, you can refer to the main branch and get into 'KE' subfolder.

The parameters are all in hparams folder, and detailed setting can be found in EasyEdit. Path to models and data should be properly set in config files.

To run the code, check the python file under root folder and run as the following:

# at main branch
python multimodal_edit.py [FUNC_NAME] [HOP_NUM] # see .py file for function names 
# at main branch, KE, can use bash scripts
./train_ke.sh [GPU_ID] [MODEL_NAME] # MODEL_NAME=[blip2, minigpt4, llava, qwen-vl, owl-2]
./test_ke.sh [GPU_ID] [MODEL_NAME] [CHECKPOINT_PATH] # test without portability
./test_multihop.sh [GPU_ID] [MODEL_NAME] [HOP_NUM] # HOP_NUM=[1, 2, 3, 4]
# at multihop_and_sequential branch
python test_base_portability.py [FUNC_NAME] [HOP_NUM] # test portability on unedited models
python test_multihop_portability.py [FUNC_NAME] [HOP_NUM]
python test_sequential_editing.py [FUNC_NAME] # hop num is 1
# at edit_onehop branch
python test_edit_onehop.py [FUNC_NAME]

(back to top)

📖 Citation

If you find our project or dataset helpful to your research, please consider citing:

@misc{huang2024vlkeb,
      title={VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark}, 
      author={Han Huang and Haitian Zhong and Tao Yu and Qiang Liu and Shu Wu and Liang Wang and Tieniu Tan},
      year={2024},
      eprint={2403.07350},
      archivePrefix={arXiv}
}

(back to top)

📧 Contact

Github (seen by all contributors) - New Issue

Han Huang - han.huang@cripac.ia.ac.cn

Haitian Zhong - haitian.zhong@cripac.ia.ac.cn

(back to top)

🎉 Acknowledgments

We would like to thank the following projects and their great works for making this project possible: MMKG, EasyEdit, KnowledgeEditor, LAVIS (BLIP2), MiniGPT-4, LLaVA, Qwen-VL, mPLUG-Owl2.

We would also like to extend our gratitude to all the other projects and contributors in the open-source community whose work may not be directly listed here but has nonetheless been invaluable. Your innovations, tools, and libraries have greatly contributed to our project. We are immensely grateful for your work!

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
KE		KE
easyeditor		easyeditor
envs		envs
figs		figs
hparams		hparams
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
multimodal_edit.py		multimodal_edit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark

🛠️ About This Project

🚀 Getting Started

Download Data

Environments

Pre-trained Models

🧪 Usage

📖 Citation

📧 Contact

🎉 Acknowledgments

About

Uh oh!

Releases

Uh oh!

Contributors 2

Languages

License

VLKEB/VLKEB

Folders and files

Latest commit

History

Repository files navigation

VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark

🛠️ About This Project

🚀 Getting Started

Download Data

Environments

Pre-trained Models

🧪 Usage

📖 Citation

📧 Contact

🎉 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors 2

Languages