Where to Play: Retrieval of Video Segments using Natural-Language Queries

07/02/2017
by   Sangkuk Lee, et al.
0

In this paper, we propose a new approach for retrieval of video segments using natural language queries. Unlike most previous approaches such as concept-based methods or rule-based structured models, the proposed method uses image captioning model to construct sentential queries for visual information. In detail, our approach exploits multiple captions generated by visual features in each image with `Densecap'. Then, the similarities between captions of adjacent images are calculated, which is used to track semantically similar captions over multiple frames. Besides introducing this novel idea of 'tracking by captioning', the proposed method is one of the first approaches that uses a language generation model learned by neural networks to construct semantic query describing the relations and properties of visual information. To evaluate the effectiveness of our approach, we have created a new evaluation dataset, which contains about 348 segments of scenes in 20 movie-trailers. Through quantitative and qualitative evaluation, we show that our method is effective for retrieval of video segments using natural language queries.

READ FULL TEXT

page 3

page 7

research
08/20/2021

Group-based Distinctive Image Captioning with Memory Attention

Describing images using natural language is widely known as image captio...
research
11/10/2019

Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries

This paper explores the task of interactive image retrieval using natura...
research
09/26/2016

Learning Language-Visual Embedding for Movie Understanding with Natural-Language

Learning a joint language-visual embedding has a number of very appealin...
research
04/12/2017

Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries

Associating image regions with text queries has been recently explored a...
research
04/10/2018

Imagine This! Scripts to Compositions to Videos

Imagining a scene described in natural language with realistic layout an...
research
02/16/2016

Contextual Media Retrieval Using Natural Language Queries

The widespread integration of cameras in hand-held and head-worn devices...
research
09/01/2023

Language-Conditioned Change-point Detection to Identify Sub-Tasks in Robotics Domains

In this work, we present an approach to identify sub-tasks within a demo...

Please sign up or login with your details

Forgot password? Click here to reset