Structured Attentions for Visual Question Answering

The repository contains the majority of the code to reproduce the experimental results of the paper Structured Attentions for Visual Question Answering on the VQA-1.0 and VQA-2.0 dataset. Currently only the accelerated version of Mean Field is provided, which is used in the VQA 2.0 challenge.

The framework of the proposed network.

Prerequisites

To reproduce the experimental results,

Clone and compile mxnet, with mxnet@c9e252, cub@89de7ab, dmlc-core@3dfbc6, nnvm@d3558d, ps-lite@acdb69, mshadow@8eb1e0. There has been some modification on optimizers (and others) in later versions of mxnet, and code in this repository has not been adapted yet.
ResNet-152 feature of MS COCO images: extracted with MCB's preprocess code.
Our training question and answer data for VQA2.0: Baidu Pan.

Training from scratch

Set the arguments and run train_VQA.py.

Pretrained models

The best single model accuracy on test-dev of VQA-1.0 and VQA-2.0 with skip-thought vector initialization and Visual Genome training data are 67.19 and 64.78 respectively. Here is the model on VQA-2.0.

Citation

If you found this repository helpful, you could cite

@article{chen2017sva,
  title={Structured Attentions for Visual Question Answering},
  author={Chen, Zhu and Yanpeng, Zhao and Shuaiyi, Huang and Kewei, Tu and Yi, Ma},
  journal={IEEE International Conference on Computer Vision (ICCV)},
  year={2017},
}

Licence

This code is distributed under MIT LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
pytorch		pytorch
.gitignore		.gitignore
README.md		README.md
loaders.py		loaders.py
symbols.py		symbols.py
test_VQA2.0.py		test_VQA2.0.py
train_VQA2.0.py		train_VQA2.0.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Structured Attentions for Visual Question Answering

Prerequisites

Training from scratch

Pretrained models

Citation

Licence

About

Uh oh!

Releases

Packages

Languages

shtechair/vqa-sva

Folders and files

Latest commit

History

Repository files navigation

Structured Attentions for Visual Question Answering

Prerequisites

Training from scratch

Pretrained models

Citation

Licence

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages