Siamese Attentional Keypoint Network for High Performance Visual Tracking

by   Peng Gao, et al.

In this paper, we investigate impacts of three main aspects of visual tracking, i.e., the backbone network, the attentional mechanism and the detection component, and propose a Siamese Attentional Keypoint Network, dubbed SATIN, to achieve efficient tracking and accurate localization. Firstly, a new Siamese lightweight hourglass network is specifically designed for visual tracking. It takes advantage of the benefits of the repeated bottom-up and top-down inference to capture more global and local contextual information at multiple scales. Secondly, a novel cross-attentional module is utilized to leverage both channel-wise and spatial intermediate attentional information, which enhance both discriminative and localization capabilities of feature maps. Thirdly, a keypoints detection approach is invented to track any target object by detecting the top-left corner point, the centroid point and the bottom-right corner point of its bounding box. To the best of our knowledge, we are the first to propose this approach. Therefore, our SATIN tracker not only has a strong capability to learn more effective object representations, but also computational and memory storage efficiency, either during the training or testing stage. Without bells and whistles, experimental results demonstrate that our approach achieves state-of-the-art performance on several recent benchmark datasets, at speeds far exceeding the frame-rate requirement.


page 3

page 10

page 17

page 18

page 19


Siamese Keypoint Prediction Network for Visual Object Tracking

Visual object tracking aims to estimate the location of an arbitrary tar...

Visual Tracking by TridentAlign and Context Embedding

Recent advances in Siamese network-based visual tracking methods have en...

Learning Cascaded Siamese Networks for High Performance Visual Tracking

Visual tracking is one of the most challenging computer vision problems....

SiamRCR: Reciprocal Classification and Regression for Visual Object Tracking

Recently, most siamese network based trackers locate targets via object ...

CXTrack: Improving 3D Point Cloud Tracking with Contextual Information

3D single object tracking plays an essential role in many applications, ...

Learning Reinforced Attentional Representation for End-to-End Visual Tracking

Despite the fact that tremendous advances have been made by numerous rec...

DCF-ASN: Coarse-to-fine Real-time Visual Tracking via Discriminative Correlation Filter and Attentional Siamese Network

Discriminative correlation filters (DCF) and siamese networks have achie...

Please sign up or login with your details

Forgot password? Click here to reset