The Long-Short Story of Movie Description

06/04/2015
by   Anna Rohrbach, et al.
0

Generating descriptions for videos has many applications including assisting blind people and human-robot interaction. The recent advances in image captioning as well as the release of large-scale movie description datasets such as MPII Movie Description allow to study this task in more depth. Many of the proposed methods for image captioning rely on pre-trained object classifier CNNs and Long-Short Term Memory recurrent networks (LSTMs) for generating descriptions. While image description focuses on objects, we argue that it is important to distinguish verbs, objects, and places in the challenging setting of movie description. In this work we show how to learn robust visual classifiers from the weak annotations of the sentence descriptions. Based on these visual classifiers we learn how to generate a description using an LSTM. We explore different design choices to build and train the LSTM and achieve the best performance to date on the challenging MPII-MD dataset. We compare and analyze our approach and prior work along various dimensions to better understand the key challenges of the movie description task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/12/2016

Movie Description

Audio Description (AD) provides linguistic descriptions of movies and al...
research
12/09/2015

Video captioning with recurrent networks based on frame- and video-level features and visual content classification

In this paper, we describe the system for generating textual description...
research
01/12/2015

A Dataset for Movie Description

Descriptive video service (DVS) provides linguistic descriptions of movi...
research
03/29/2023

AutoAD: Movie Description in Context

The objective of this paper is an automatic Audio Description (AD) model...
research
04/05/2017

Generating Descriptions with Grounded and Co-Referenced People

Learning how to generate descriptions of images or videos received major...
research
08/22/2020

Identity-Aware Multi-Sentence Video Description

Standard video and movie description tasks abstract away from person ide...
research
11/24/2017

Convolutional Image Captioning

Image captioning is an important but challenging task, applicable to vir...

Please sign up or login with your details

Forgot password? Click here to reset