Modeling Coreference Relations in Visual Dialog

03/06/2022
by   Mingxiao Li, et al.
0

Visual dialog is a vision-language task where an agent needs to answer a series of questions grounded in an image based on the understanding of the dialog history and the image. The occurrences of coreference relations in the dialog makes it a more challenging task than visual question-answering. Most previous works have focused on learning better multi-modal representations or on exploring different ways of fusing visual and language features, while the coreferences in the dialog are mainly ignored. In this paper, based on linguistic knowledge and discourse features of human dialog we propose two soft constraints that can improve the model's ability of resolving coreferences in dialog in an unsupervised way. Experimental results on the VisDial v1.0 dataset shows that our model, which integrates two novel and linguistically inspired soft constraints in a deep transformer neural architecture, obtains new state-of-the-art performance in terms of recall at 1 and other evaluation metrics compared to current existing models and this without pretraining on other vision-language datasets. Our qualitative results also demonstrate the effectiveness of the method that we propose.

READ FULL TEXT

page 1

page 3

page 9

research
12/06/2018

Recursive Visual Attention in Visual Dialog

Visual dialog is a challenging vision-language task, which requires the ...
research
04/28/2020

VD-BERT: A Unified Vision and Dialog Transformer with BERT

Visual dialog is a challenging vision-language task, where a dialog agen...
research
05/24/2021

Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation

GuessWhat?! is a two-player visual dialog guessing game where player A a...
research
10/13/2019

Granular Multimodal Attention Networks for Visual Dialog

Vision and language tasks have benefited from attention. There have been...
research
12/05/2019

Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline

Prior work in visual dialog has focused on training deep neural models o...
research
04/07/2023

Gated Mechanism Enhanced Multi-Task Learning for Dialog Routing

Currently, human-bot symbiosis dialog systems, e.g., pre- and after-sale...
research
11/26/2019

Efficient Attention Mechanism for Handling All the Interactions between Many Inputs with Application to Visual Dialog

It has been a primary concern in recent studies of vision and language t...

Please sign up or login with your details

Forgot password? Click here to reset