You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MFB and MFH refer to MFB-baseline and MFH-baseline, respectively.
The results of MFB and MFH are trained with train sets, tested with val sets, using ResNet152 pool5 features. The result of MFH+CoAtt+GloVe is trained with train+val sets, tested with test-dev sets.
Training from Scratch
$ python train_*.py
Most of the hyper-parameters and configrations with comments are defined in the config.py file.
Pretrained GloVe word embedding model (the spacy library) is required to train the mfb/h-coatt-glove model. The installation instructions of spacy and GloVe model can be found here.
Citation
If you find this implementation helpful, please consider citing:
@article{yu2017mfb,
title={Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering},
author={Yu, Zhou and Yu, Jun and Fan, Jianping and Tao, Dacheng},
journal={IEEE International Conference on Computer Vision (ICCV)},
year={2017}
}
@article{yu2017beyond,
title={Beyond Bilinear: Generalized Multi-modal Factorized High-order Pooling for Visual Question Answering},
author={Yu, Zhou and Yu, Jun and Xiang, Chenchao and Fan, Jianping and Tao, Dacheng},
journal={arXiv preprint arXiv:1708.03619},
year={2017}
}
About
This project is out of date, I don't remember the details inside...