Movie101: A New Movie Understanding Benchmark

by   Zihao Yue, et al.

To help the visually impaired enjoy movies, automatic movie narrating systems are expected to narrate accurate, coherent, and role-aware plots when there are no speaking lines of actors. Existing works benchmark this challenge as a normal video captioning task via some simplifications, such as removing role names and evaluating narrations with ngram-based metrics, which makes it difficult for automatic systems to meet the needs of real application scenarios. To narrow this gap, we construct a large-scale Chinese movie benchmark, named Movie101. Closer to real scenarios, the Movie Clip Narrating (MCN) task in our benchmark asks models to generate role-aware narration paragraphs for complete movie clips where no actors are speaking. External knowledge, such as role information and movie genres, is also provided for better movie understanding. Besides, we propose a new metric called Movie Narration Score (MNScore) for movie narrating evaluation, which achieves the best correlation with human evaluation. Our benchmark also supports the Temporal Narration Grounding (TNG) task to investigate clip localization given text descriptions. For both two tasks, our proposed methods well leverage external knowledge and outperform carefully designed baselines. The dataset and codes are released at


page 2

page 13

page 14


Is "my favorite new movie" my favorite movie? Probing the Understanding of Recursive Noun Phrases

Recursive noun phrases (NPs) have interesting semantic properties. For e...

Movie Description

Audio Description (AD) provides linguistic descriptions of movies and al...

M-VAD Names: a Dataset for Video Captioning with Naming

Current movie captioning architectures are not capable of mentioning cha...

Detecting Spoilers in Movie Reviews with External Movie Knowledge and User Networks

Online movie review platforms are providing crowdsourced feedback for th...

V2C: Visual Voice Cloning

Existing Voice Cloning (VC) tasks aim to convert a paragraph text to a s...

AutoAD: Movie Description in Context

The objective of this paper is an automatic Audio Description (AD) model...

Identity-Aware Multi-Sentence Video Description

Standard video and movie description tasks abstract away from person ide...

Please sign up or login with your details

Forgot password? Click here to reset