GoG: Relation-aware Graph-over-Graph Network for Visual Dialog

09/17/2021
by   Feilong Chen, et al.
0

Visual dialog, which aims to hold a meaningful conversation with humans about a given image, is a challenging task that requires models to reason the complex dependencies among visual content, dialog history, and current questions. Graph neural networks are recently applied to model the implicit relations between objects in an image or dialog. However, they neglect the importance of 1) coreference relations among dialog history and dependency relations between words for the question representation; and 2) the representation of the image based on the fully represented question. Therefore, we propose a novel relation-aware graph-over-graph network (GoG) for visual dialog. Specifically, GoG consists of three sequential graphs: 1) H-Graph, which aims to capture coreference relations among dialog history; 2) History-aware Q-Graph, which aims to fully understand the question through capturing dependency relations between words based on coreference resolution on the dialog history; and 3) Question-aware I-Graph, which aims to capture the relations between objects in an image based on fully question representation. As an additional feature representation module, we add GoG to the existing visual dialogue model. Experimental results show that our model outperforms the strong baseline in both generative and discriminative settings by a significant margin.

READ FULL TEXT

page 1

page 9

page 14

research
12/18/2019

DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog

Visual Dialog is a vision-language task that requires an AI agent to eng...
research
04/11/2019

Reasoning Visual Dialogs with Structural and Partial Observations

We propose a novel model to address the task of Visual Dialog which exhi...
research
07/31/2020

Interactive Text Graph Mining with a Prolog-based Dialog Engine

On top of a neural network-based dependency parser and a graph-based nat...
research
01/17/2020

Modality-Balanced Models for Visual Dialogue

The Visual Dialog task requires a model to exploit both image and conver...
research
11/17/2014

Relations World: A Possibilistic Graphical Model

We explore the idea of using a "possibilistic graphical model" as the ba...
research
04/05/2020

Iterative Context-Aware Graph Inference for Visual Dialog

Visual dialog is a challenging task that requires the comprehension of t...
research
07/08/2022

Video Dialog as Conversation about Objects Living in Space-Time

It would be a technological feat to be able to create a system that can ...

Please sign up or login with your details

Forgot password? Click here to reset