Online Deep Clustering with Video Track Consistency

06/07/2022
by   Alessandra Alfani, et al.
0

Several unsupervised and self-supervised approaches have been developed in recent years to learn visual features from large-scale unlabeled datasets. Their main drawback however is that these methods are hardly able to recognize visual features of the same object if it is simply rotated or the perspective of the camera changes. To overcome this limitation and at the same time exploit a useful source of supervision, we take into account video object tracks. Following the intuition that two patches in a track should have similar visual representations in a learned feature space, we adopt an unsupervised clustering-based approach and constrain such representations to be labeled as the same category since they likely belong to the same object or object part. Experimental results on two downstream tasks on different datasets demonstrate the effectiveness of our Online Deep Clustering with Video Track Consistency (ODCT) approach compared to prior work, which did not leverage temporal information. In addition we show that exploiting an unsupervised class-agnostic, yet noisy, track generator yields to better accuracy compared to relying on costly and precise track annotations.

READ FULL TEXT
research
05/04/2015

Unsupervised Learning of Visual Representations using Videos

Is strong supervision necessary for learning a good visual representatio...
research
05/03/2019

Leveraging Large-Scale Uncurated Data for Unsupervised Pre-training of Visual Features

Pre-training general-purpose visual features with convolutional neural n...
research
11/10/2021

Self-Supervised Multi-Object Tracking with Cross-Input Consistency

In this paper, we propose a self-supervised learning procedure for train...
research
03/03/2019

Self-Supervised Learning of Face Representations for Video Face Clustering

Analyzing the story behind TV series and movies often requires understan...
research
01/20/2022

DFBVS: Deep Feature-Based Visual Servo

Classical Visual Servoing (VS) rely on handcrafted visual features, whic...
research
06/25/2018

Tracking Emerges by Colorizing Videos

We use large amounts of unlabeled video to learn models for visual track...
research
06/15/2015

Slow and steady feature analysis: higher order temporal coherence in video

How can unlabeled video augment visual learning? Existing methods perfor...

Please sign up or login with your details

Forgot password? Click here to reset