Show Why the Answer is Correct! Towards Explainable AI using Compositional Temporal Attention

05/15/2021
by   Nihar Bendre, et al.
0

Visual Question Answering (VQA) models have achieved significant success in recent times. Despite the success of VQA models, they are mostly black-box models providing no reasoning about the predicted answer, thus raising questions for their applicability in safety-critical such as autonomous systems and cyber-security. Current state of the art fail to better complex questions and thus are unable to exploit compositionality. To minimize the black-box effect of these models and also to make them better exploit compositionality, we propose a Dynamic Neural Network (DMN), which can understand a particular question and then dynamically assemble various relatively shallow deep learning modules from a pool of modules to form a network. We incorporate compositional temporal attention to these deep learning based modules to increase compositionality exploitation. This results in achieving better understanding of complex questions and also provides reasoning as to why the module predicts a particular answer. Experimental analysis on the two benchmark datasets, VQA2.0 and CLEVR, depicts that our model outperforms the previous approaches for Visual Question Answering task as well as provides better reasoning, thus making it reliable for mission critical applications like safety and security.

READ FULL TEXT

page 1

page 5

page 6

research
11/09/2015

Neural Module Networks

Visual question answering is fundamentally compositional in nature---a q...
research
06/25/2022

From Shallow to Deep: Compositional Reasoning over Graphs for Visual Question Answering

In order to achieve a general visual question answering (VQA) system, it...
research
10/06/2021

Coarse-to-Fine Reasoning for Visual Question Answering

Bridging the semantic gap between image and question is an important ste...
research
03/31/2018

Visual Question Reasoning on General Dependency Tree

The collaborative reasoning for understanding each image-question pair i...
research
09/06/2018

Interpretable Visual Question Answering by Reasoning on Dependency Trees

Collaborative reasoning for understanding each image-question pair is ve...
research
04/12/2022

AGQA 2.0: An Updated Benchmark for Compositional Spatio-Temporal Reasoning

Prior benchmarks have analyzed models' answers to questions about videos...
research
03/30/2021

AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning

Visual events are a composition of temporal actions involving actors spa...

Please sign up or login with your details

Forgot password? Click here to reset