Learning Distributional Representation and Set Distance for Multi-shot Person Re-identification
Person re-identification aims to identify a specific person at distinct time and locations. It is challenging because of occlusion, illumination, and viewpoint change in camera views. Recently, multi-shot person re-id task receives more attention because it is closer to real world application. A key point of a good algorithm for multi-shot person re-id is how to aggregate appearance features of all images temporally. Most of the current approaches apply pooling strategies and obtain a fixed size representation. We argue that representing a set of images as a feature vector may lose the matching evidences between examples. introducing multi-stage attention mechanism. However, In this work, we propose the idea of distributional representation, which interprets a image set as samples generated from a distribution in appearance feature space, and learn a distributional set distance function to compare two image sets. Specifically, we choose Wasserstein distance in this study. In this way, the proper alignment between two image sets can be discovered naturally in an non-parametric manner. Furthermore, the distance between distributions can serve as a supervision signal to finetune the appearance feature extractor in our model. Experiment results show that our proposed method achieve state-of-the-art performance on MARS dataset.
READ FULL TEXT