A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models

Welcome to the official code repository for the paper "A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models" authored by Noriyuki Kojima, Hadar Averbuch-Elor, and Yoav Artzi.

About

Link to the Paper

Abstract

Key to tasks that require reasoning about natural language in visual contexts is grounding words and phrases to image regions. However, observing this grounding in contemporary models is complex, even if it is generally expected to take place if the task is addressed in a way that is conductive to generalization. We propose a framework to jointly study task performance and phrase grounding, and propose three benchmarks to study the relation between the two. Our results show that contemporary models demonstrate inconsistency between their ability to ground phrases and solve tasks. We show how this can be addressed through brute-force training on ground phrasing annotations, and analyze the dynamics it creates.

Codebase

Installation

Set up the conda environment: conda create -n grounding python=3.8
Clone the repository.
Install necessary dependencies: pip install -r requirements.txt

Subdirectories

READMEs/: Contains instructions for training and testing models.
src/: Contains scripts to train and test models.
data/: Stores data files.
results/: Stores experimental outcomes, like model checkpoints.
media/: Features images, GIFs, and videos for presentations and PRs.

Quick-start Guide

1. Preparing the Data

To prepare and preprocess data, refer to the instructions provided.

2. Training and Testing Models

To train and test models, refer to the instructions provided.

License

Licensed under the MIT License.

How to Cite

If our work aids your research, kindly reference our paper:

@misc{Kojima2023:grounding,
  title         = {A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models},
  author        = {Noriyuki Kojima and Hadar Averbuch-Elor and Yoav Artzi},
  year          = {2023},
  eprint        = {},
  archiveprefix = {arXiv}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
READMEs		READMEs
media		media
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models

About

Abstract

Codebase

Installation

Subdirectories

Quick-start Guide

1. Preparing the Data

2. Training and Testing Models

License

How to Cite

About

Uh oh!

Releases

Packages

Uh oh!

Languages

lil-lab/phrase_grounding

Folders and files

Latest commit

History

Repository files navigation

A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models

About

Abstract

Codebase

Installation

Subdirectories

Quick-start Guide

1. Preparing the Data

2. Training and Testing Models

License

How to Cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages