Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training

This repo is built on top of VSE++ and TERAN.

Setup

Setup python environment using conda:

conda env create --file environment.yml
conda activate gls
export PYTHONPATH=.

Get the data

Download and extract the data folder, containing annotations, the splits by Karpathy et al. and ROUGEL - SPICE precomputed relevances for both COCO and Flickr30K datasets:

wget https://datino.isti.cnr.it/teran/data.tar
tar -xvf data.tar

Download the bottom-up features for both COCO and Flickr30K. We use the code by Anderson et al. for extracting them. The following command extracts them under data/coco/ and data/f30k/. If you prefer another location, be sure to adjust the configuration file accordingly.

# for MS-COCO
wget https://datino.isti.cnr.it/teran/features_36_coco.tar
tar -xvf features_36_coco.tar -C data/coco
# for Flickr30k
wget https://datino.isti.cnr.it/teran/features_36_f30k.tar
tar -xvf features_36_f30k.tar -C data/f30k

Evaluate

Download and extract our pre-trained models.

Then, issue the following commands for evaluating a given model.

# F30K
python3 test.py runs/f30k_m0.3/model_best_rsum.pth.tar --config configs/f30k_global.yaml
python3 test.py runs/f30k_m0.3/model_best_rsum.pth.tar --config configs/f30k_local.yaml
python3 test_gl.py runs/f30k_m0.3/model_best_rsum.pth.tar --config configs/f30k_local.yaml
# COCO
python3 test.py runs/coco_m0.3/model_best_rsum.pth.tar --config configs/coco_global.yaml
python3 test.py runs/coco_m0.3/model_best_rsum.pth.tar --config configs/coco_local.yaml
python3 test_gl.py runs/coco_m0.3/model_best_rsum.pth.tar --config configs/coco_local.yaml

Train

In order to train the model using a given configuration, issue the following command:

python3 train.py --config configs/f30k_all.yaml --logger_name runs/f30k_m0.3
python3 train.py --config configs/coco_all.yaml --logger_name runs/coco_m0.3

Citation

Please cite this work if you find it useful:.

@article{liu2023efficient,
  title={Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training},
  author={Liu, Chong and Zhang, Yuqi and Wang, Hongsong and Chen, Weihua and Wang, Fan and Huang, Yan and Shen, Yi-Dong and Wang, Liang},
  journal={IEEE Transactions on Image Processing},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
evaluate_utils		evaluate_utils
models		models
LICENSE		LICENSE
README.md		README.md
data.py		data.py
environment.yml		environment.yml
evaluation.py		evaluation.py
features.py		features.py
test.py		test.py
test_gl.py		test_gl.py
train.py		train.py
train_ft.py		train_ft.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training

Setup

Get the data

Evaluate

Train

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

LCFractal/TGDT

Folders and files

Latest commit

History

Repository files navigation

Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training

Setup

Get the data

Evaluate

Train

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages