Attention-based Natural Language Person Retrieval

05/24/2017
by   Tao Zhou, et al.
0

Following the recent progress in image classification and captioning using deep learning, we develop a novel natural language person retrieval system based on an attention mechanism. More specifically, given the description of a person, the goal is to localize the person in an image. To this end, we first construct a benchmark dataset for natural language person retrieval. To do so, we generate bounding boxes for persons in a public image dataset from the segmentation masks, which are then annotated with descriptions and attributes using the Amazon Mechanical Turk. We then adopt a region proposal network in Faster R-CNN as a candidate region generator. The cropped images based on the region proposals as well as the whole images with attention weights are fed into Convolutional Neural Networks for visual feature extraction, while the natural language expression and attributes are input to Bidirectional Long Short- Term Memory (BLSTM) models for text feature extraction. The visual and text features are integrated to score region proposals, and the one with the highest score is retrieved as the output of our system. The experimental results show significant improvement over the state-of-the-art method for generic object retrieval and this line of research promises to benefit search in surveillance video footage.

READ FULL TEXT

page 2

page 3

page 4

page 7

research
09/27/2021

Text-based Person Search in Full Images via Semantic-Driven Proposal Generation

Finding target persons in full scene images with a query of text descrip...
research
03/22/2017

An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep Reinforcement Learning

We propose an end-to-end approach to the natural language object retriev...
research
05/06/2021

Person Retrieval in Surveillance Using Textual Query: A Review

Recent advancement of research in biometrics, computer vision, and natur...
research
07/14/2023

AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes

Image captioning is a significant field across computer vision and natur...
research
03/16/2018

Object Captioning and Retrieval with Natural Language

We address the problem of jointly learning vision and language to unders...
research
09/02/2018

Natural Language Person Search Using Deep Reinforcement Learning

Recent success in deep reinforcement learning is having an agent learn h...
research
11/13/2015

Natural Language Object Retrieval

In this paper, we address the task of natural language object retrieval,...

Please sign up or login with your details

Forgot password? Click here to reset