Graph-Structured Representations for Visual Question Answering

09/19/2016
by   Damien Teney, et al.
0

This paper proposes to improve visual question answering (VQA) with structured representations of both scene contents and questions. A key challenge in VQA is to require joint reasoning over the visual and text domains. The predominant CNN/LSTM-based approach to VQA is limited by monolithic vector representations that largely ignore structure in the scene and in the form of the question. CNN feature vectors cannot effectively capture situations as simple as multiple object instances, and LSTMs process questions as series of words, which does not reflect the true complexity of language structure. We instead propose to build graphs over the scene objects and over the question words, and we describe a deep neural network that exploits the structure in these representations. This shows significant benefit over the sequential processing of LSTMs. The overall efficacy of our approach is demonstrated by significant improvements over the state-of-the-art, from 71.2 to 74.4 from 34.7 with fine-grained differences and opposite yes/no answers to a same question.

READ FULL TEXT

page 8

page 12

page 13

page 14

page 15

page 16

page 17

research
01/14/2021

Understanding the Role of Scene Graphs in Visual Question Answering

Visual Question Answering (VQA) is of tremendous interest to the researc...
research
08/05/2019

Answering Questions about Data Visualizations using Efficient Bimodal Fusion

Chart question answering (CQA) is a newly proposed visual question answe...
research
03/23/2017

Recurrent and Contextual Models for Visual Question Answering

We propose a series of recurrent and contextual neural network models fo...
research
04/16/2016

Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering

This paper proposes deep convolutional network models that utilize local...
research
05/16/2022

A Neuro-Symbolic ASP Pipeline for Visual Question Answering

We present a neuro-symbolic visual question answering (VQA) pipeline for...
research
01/31/2021

An Empirical Study on the Generalization Power of Neural Representations Learned via Visual Guessing Games

Guessing games are a prototypical instance of the "learning by interacti...
research
01/25/2022

SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering

Visual Question Answering (VQA) attracts much attention from both indust...

Please sign up or login with your details

Forgot password? Click here to reset