Multi-modal Factorized Bilinear Pooling (MFB) for VQA

This is an unofficial and Pytorch implementation for Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering and Beyond Bilinear: Generalized Multi-modal Factorized High-order Pooling for Visual Question Answering.

The result of MFB-baseline and MFH-baseline can be replicated.(Not able to replicate MFH-coatt-glove result, maybe a devil hidden in detail.)

The author helped me a lot when I tried to replicate the result. Great thanks.

The official implementation is based on pycaffe is available here.

Requirements

Python 2.7, pytorch 0.2, torchvision 0.1.9, tensorboardX

Result

Datasets\Models	MFB	MFH	MFH+CoAtt+GloVe (FRCN img features)
VQA-1.0	58.75%	59.15%	68.78%

MFB and MFH refer to MFB-baseline and MFH-baseline, respectively.
The results of MFB and MFH are trained with train sets, tested with val sets, using ResNet152 pool5 features. The result of MFH+CoAtt+GloVe is trained with train+val sets, tested with test-dev sets.

Training from Scratch

$ python train_*.py

Most of the hyper-parameters and configrations with comments are defined in the config.py file.
Pretrained GloVe word embedding model (the spacy library) is required to train the mfb/h-coatt-glove model. The installation instructions of spacy and GloVe model can be found here.

Citation

If you find this implementation helpful, please consider citing:

@article{yu2017mfb,
  title={Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering},
  author={Yu, Zhou and Yu, Jun and Fan, Jianping and Tao, Dacheng},
  journal={IEEE International Conference on Computer Vision (ICCV)},
  year={2017}
}
@article{yu2017beyond,
  title={Beyond Bilinear: Generalized Multi-modal Factorized High-order Pooling for Visual Question Answering},
  author={Yu, Zhou and Yu, Jun and Xiang, Chenchao and Fan, Jianping and Tao, Dacheng},
  journal={arXiv preprint arXiv:1708.03619},
  year={2017}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
imgs		imgs
mfb_baseline		mfb_baseline
mfb_coatt_glove		mfb_coatt_glove
mfh_baseline		mfh_baseline
mfh_coatt_glove		mfh_coatt_glove
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-modal Factorized Bilinear Pooling (MFB) for VQA

Requirements

Result

Training from Scratch

Citation

About

Uh oh!

Releases

Packages

Languages

asdf0982/vqa-mfb.pytorch

Folders and files

Latest commit

History

Repository files navigation

Multi-modal Factorized Bilinear Pooling (MFB) for VQA

Requirements

Result

Training from Scratch

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages