Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search

01/08/2021
by   Chenyang Gao, et al.
2

Text-based person search aims at retrieving target person in an image gallery using a descriptive sentence of that person. It is very challenging since modal gap makes effectively extracting discriminative features more difficult. Moreover, the inter-class variance of both pedestrian images and descriptions is small. So comprehensive information is needed to align visual and textual clues across all scales. Most existing methods merely consider the local alignment between images and texts within a single scale (e.g. only global scale or only partial scale) then simply construct alignment at each scale separately. To address this problem, we propose a method that is able to adaptively align image and textual features across all scales, called NAFS (i.e.Non-local Alignment over Full-Scale representations). Firstly, a novel staircase network structure is proposed to extract full-scale image features with better locality. Secondly, a BERT with locality-constrained attention is proposed to obtain representations of descriptions at different scales. Then, instead of separately aligning features at each scale, a novel contextual non-local attention mechanism is applied to simultaneously discover latent alignments across all scales. The experimental results show that our method outperforms the state-of-the-art methods by 5.53 in terms of top-5 on text-based person search dataset. The code is available at https://github.com/TencentYoutuResearch/PersonReID-NAFS

READ FULL TEXT

page 1

page 3

page 7

research
05/25/2021

TIPCB: A Simple but Effective Part-based Convolutional Baseline for Text-based Person Search

Text-based person search is a sub-task in the field of image retrieval, ...
research
07/27/2021

Semantically Self-Aligned Network for Text-to-Image Part-aware Person Re-identification

Text-to-image person re-identification (ReID) aims to search for images ...
research
08/30/2022

Image-Specific Information Suppression and Implicit Local Alignment for Text-based Person Search

Text-based person search is a challenging task that aims to search pedes...
research
12/13/2021

Learning Semantic-Aligned Feature Representation for Text-based Person Search

Text-based person search aims to retrieve images of a certain pedestrian...
research
11/16/2022

Person Text-Image Matching via Text-Feature Interpretability Embedding and External Attack Node Implantation

Person text-image matching, also known as text based person search, aims...
research
10/20/2021

Text-Based Person Search with Limited Data

Text-based person search (TBPS) aims at retrieving a target person from ...
research
05/22/2021

Video-based Person Re-identification without Bells and Whistles

Video-based person re-identification (Re-ID) aims at matching the video ...

Please sign up or login with your details

Forgot password? Click here to reset