On the hidden treasure of dialog in video question answering

03/26/2021
by   Deniz Engin, et al.
0

High-level understanding of stories in video such as movies and TV shows from raw data is extremely challenging. Modern video question answering (VideoQA) systems often use additional human-made sources like plot synopses, scripts, video descriptions or knowledge bases. In this work, we present a new approach to understand the whole story without such external sources. The secret lies in the dialog: unlike any prior work, we treat dialog as a noisy source to be converted into text description via dialog summarization, much like recent methods treat video. The input of each modality is encoded by transformers independently, and a simple fusion method combines all modalities, using soft temporal attention for localization over long inputs. Our model outperforms the state of the art on the KnowIT VQA dataset by a large margin, without using question-specific human annotation or human-made plot summaries. It even outperforms human evaluators who have never watched any whole episode before.

READ FULL TEXT
research
07/17/2020

Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions

To understand movies, humans constantly reason over the dialogues and ac...
research
03/29/2018

Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering

Human conversation is a complex mechanism with subtle nuances. It is hen...
research
01/15/2021

Recent Advances in Video Question Answering: A Review of Datasets and Methods

Video Question Answering (VQA) is a recent emerging challenging task in ...
research
07/31/2019

Learning Question-Guided Video Representation for Multi-Turn Video Question Answering

Understanding and conversing about dynamic scenes is one of the key capa...
research
10/27/2020

Co-attentional Transformers for Story-Based Video Understanding

Inspired by recent trends in vision and language learning, we explore ap...
research
04/11/2019

Factor Graph Attention

Dialog is an effective way to exchange information, but subtle details a...
research
02/20/2018

Combining Textual Content and Structure to Improve Dialog Similarity

Chatbots, taking advantage of the success of the messaging apps and rece...

Please sign up or login with your details

Forgot password? Click here to reset