Global Instance Tracking: Locating Target More Like Humans

by   Shiyu Hu, et al.

Target tracking, the essential ability of the human visual system, has been simulated by computer vision tasks. However, existing trackers perform well in austere experimental environments but fail in challenges like occlusion and fast motion. The massive gap indicates that researches only measure tracking performance rather than intelligence. How to scientifically judge the intelligence level of trackers? Distinct from decision-making problems, lacking three requirements (a challenging task, a fair environment, and a scientific evaluation procedure) makes it strenuous to answer the question. In this article, we first propose the global instance tracking (GIT) task, which is supposed to search an arbitrary user-specified instance in a video without any assumptions about camera or motion consistency, to model the human visual tracking ability. Whereafter, we construct a high-quality and large-scale benchmark VideoCube to create a challenging environment. Finally, we design a scientific evaluation procedure using human capabilities as the baseline to judge tracking intelligence. Additionally, we provide an online platform with toolkit and an updated leaderboard. Although the experimental results indicate a definite gap between trackers and humans, we expect to take a step forward to generate authentic human-like trackers. The database, toolkit, evaluation server, and baseline results are available at


page 2

page 3

page 5

page 6

page 7

page 8

page 15

page 16


SOTVerse: A User-defined Task Space of Single Object Tracking

Single object tracking (SOT) research falls into a cycle - trackers perf...

Is First Person Vision Challenging for Object Tracking?

Understanding human-object interactions is fundamental in First Person V...

AnimalTrack: A Large-scale Benchmark for Multi-Animal Tracking in the Wild

Multi-animal tracking (MAT), a multi-object tracking (MOT) problem, is c...

Visual Object Tracking in First Person Vision

The understanding of human-object interactions is fundamental in First P...

Predictive Visual Tracking: A New Benchmark and Baseline Approach

As a crucial robotic perception capability, visual tracking has been int...

GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild

In this work, we introduce a large high-diversity database for generic o...

TracKlinic: Diagnosis of Challenge Factors in Visual Tracking

Generic visual tracking is difficult due to many challenge factors (e.g....

Please sign up or login with your details

Forgot password? Click here to reset