Deep Cross of Intra and Inter Modalities for Visual Question Answering

Rishav Bhardwaj

doi:10.2991/ahis.k.210913.007

<Previous Article In Volume

Next Article In Volume>

Deep Cross of Intra and Inter Modalities for Visual Question Answering

Authors

Rishav Bhardwaj

Corresponding Author

Rishav Bhardwaj

Available Online 13 September 2021.

DOI: 10.2991/ahis.k.210913.007 How to use a DOI?
Keywords: Deep Learning, Inter-Modality Fusion, Intra-Modality Fusion, Visual Question Answering
Abstract: Visual Question Answering (VQA) has recently attained interest in the deep learning community. The main challenge that exists in VQA is to understand the sense of each modality and how to fuse these features. In this paper, DXMN (Deep Cross Modality Network) is introduced which takes into consideration not only the inter-modality fusion but also the intra-modality fusion. The main idea behind this architecture is to take the positioning of each feature into account and then recognize the relationship between multi-modal features as well as establishing a relationship among themselves in order to learn them in a better way. The architecture is pretrained on question answering datasets like, VQA v2.0, GQA, and Visual Genome which is later fine-tuned to achieve state-of-the-art performance. DXMN achieves an accuracy of 68.65 in test-standard and 68.43 in test-dev of VQA v2.0 dataset.
Copyright: © 2021, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 3rd International Conference on Integrated Intelligent Computing Communication & Security (ICIIC 2021)
Series: Atlantis Highlights in Computer Sciences
Publication Date: 13 September 2021
ISBN: 978-94-6239-428-5
ISSN: 2589-4900
DOI: 10.2991/ahis.k.210913.007 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Rishav Bhardwaj
PY  - 2021
DA  - 2021/09/13
TI  - Deep Cross of Intra and Inter Modalities for Visual Question Answering
BT  - Proceedings of the 3rd International Conference on Integrated Intelligent Computing Communication & Security (ICIIC 2021)
PB  - Atlantis Press
SP  - 47
EP  - 53
SN  - 2589-4900
UR  - https://doi.org/10.2991/ahis.k.210913.007
DO  - 10.2991/ahis.k.210913.007
ID  - Bhardwaj2021
ER  -

download .riscopy to clipboard