Audio-Visual Scene-Aware Dialog

01/25/2019
by   Huda Alamri, et al.
48

We introduce the task of scene-aware dialog. Given a follow-up question in an ongoing dialog about a video, our goal is to generate a complete and natural response to a question given (a) an input video, and (b) the history of previous turns in the dialog. To succeed, agents must ground the semantics in the video and leverage contextual cues from the history of the dialog to answer the question. To benchmark this task, we introduce the Audio Visual Scene-Aware Dialog (AVSD) dataset. For each of more than 11,000 videos of human actions for the Charades dataset. Our dataset contains a dialog about the video, plus a final summary of the video by one of the dialog participants. We train several baseline systems for this task and evaluate the performance of the trained models using several qualitative and quantitative metrics. Our results indicate that the models must comprehend all the available inputs (video, audio, question and dialog history) to perform well on this dataset.

READ FULL TEXT

page 1

page 3

page 5

page 8

page 12

page 13

page 14

research
06/01/2018

Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7

Scene-aware dialog systems will be able to have conversations with users...
research
11/26/2016

Visual Dialog

We introduce the task of Visual Dialog, which requires an AI agent to ho...
research
07/08/2020

Spatio-Temporal Scene Graphs for Video Dialog

The Audio-Visual Scene-aware Dialog (AVSD) task requires an agent to ind...
research
05/08/2020

History for Visual Dialog: Do we really need it?

Visual Dialog involves "understanding" the dialog history (what has been...
research
07/08/2022

Video Dialog as Conversation about Objects Living in Space-Time

It would be a technological feat to be able to create a system that can ...
research
03/16/2022

Spot the Difference: A Cooperative Object-Referring Game in Non-Perfectly Co-Observable Scene

Visual dialog has witnessed great progress after introducing various vis...
research
08/22/2019

Entropy-Enhanced Multimodal Attention Model for Scene-Aware Dialogue Generation

With increasing information from social media, there are more and more v...

Please sign up or login with your details

Forgot password? Click here to reset