Multi-direction and Multi-scale Pyramid in Transformer for Video-based Pedestrian Retrieval

02/12/2022
by   Xianghao Zang, et al.
0

In video surveillance, pedestrian retrieval (also called person re-identification) is a critical task. This task aims to retrieve the pedestrian of interest from non-overlapping cameras. Recently, transformer-based models have achieved significant progress for this task. However, these models still suffer from ignoring fine-grained, part-informed information. This paper proposes a multi-direction and multi-scale Pyramid in Transformer (PiT) to solve this problem. In transformer-based architecture, each pedestrian image is split into many patches. Then, these patches are fed to transformer layers to obtain the feature representation of this image. To explore the fine-grained information, this paper proposes to apply vertical division and horizontal division on these patches to generate different-direction human parts. These parts provide more fine-grained information. To fuse multi-scale feature representation, this paper presents a pyramid structure containing global-level information and many pieces of local-level information from different scales. The feature pyramids of all the pedestrian images from the same video are fused to form the final multi-direction and multi-scale feature representation. Experimental results on two challenging video-based benchmarks, MARS and iLIDS-VID, show the proposed PiT achieves state-of-the-art performance. Extensive ablation studies demonstrate the superiority of the proposed pyramid structure. The code is available at https://git.openi.org.cn/zangxh/PiT.git.

READ FULL TEXT

page 1

page 3

page 4

page 8

page 9

research
12/07/2020

Fine-Grained Dynamic Head for Object Detection

The Feature Pyramid Network (FPN) presents a remarkable approach to alle...
research
09/28/2017

HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis

Pedestrian analysis plays a vital role in intelligent video surveillance...
research
11/26/2020

Fine-Grained Re-Identification

Research into the task of re-identification (ReID) is picking up momentu...
research
09/07/2021

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

Transformer, as a strong and flexible architecture for modelling long-ra...
research
02/26/2022

Orientation-Discriminative Feature Representation for Decentralized Pedestrian Tracking

This paper focuses on the problem of decentralized pedestrian tracking u...
research
07/13/2022

Pyramid Transformer for Traffic Sign Detection

Traffic sign detection is a vital task in the visual system of self-driv...
research
12/03/2020

Temporal Pyramid Network for Pedestrian Trajectory Prediction with Multi-Supervision

Predicting human motion behavior in a crowd is important for many applic...

Please sign up or login with your details

Forgot password? Click here to reset