You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We follow exact the same setting data format as MUSIC AVQA.
Notice: We examined the original annotation files of Clotho-AQA and found that the official open-source annotations were not cleansed, resulting in discrepancies where different annotators provided different answers for the same question. As a result, we performed a simple filtering process where we considered a question to have the correct answer if it had at least two identical answers Based on this filtering process, we obtained a new and more accurate annotation file. The files in 'metadata' folder are described as follows
'single_word_[train/val/test].csv', Does not contain samples with answers yes and no.
'single_word_[train/val/test]_clean.csv', Does not contain samples with answers yes and no. (Cleaned data)
'clotho_aqa_[train/val/test]_clean.csv', Contains samples with answers yes and no. (Cleaned data)
'binary_[train/val/test]_clean.csv', Include only samples with answers yes and no. (Cleaned data)
Train and evaluate
Training
pythonmain_MWAFM.py--modetrain
Testing
pythonmain_MWAFM.py--modetest
Citation
If you find this work useful, please consider citing it.
@ARTICLE{Li2023MultiScale,
title = {Multi-Scale Attention for Audio Question Answering},
author = {Guangyao li, Yixin Xu, Di Hu},
journal = {Proc. INTERSPEECH},
year = {2023},
}
Acknowledgement
This research was supported by Public Computing Cloud, Renmin University of China.
About
Multi-Scale Attention for Audio Question Answering