Hard Regularization to Prevent Collapse in Online Deep Clustering without Data Augmentation

03/29/2023
by   Louis Mahon, et al.
0

Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed. While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster. Successful existing models have employed various techniques to avoid this problem, most of which require data augmentation or which aim to make the average soft assignment across the dataset the same for each cluster. We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments. Using a Bayesian framework, we derive an intuitive optimization objective that can be straightforwardly included in the training of the encoder network. Tested on four image datasets, we show that it consistently avoids collapse more robustly than other methods and that it leads to more accurate clustering. We also conduct further experiments and analyses justifying our choice to regularize the hard cluster assignments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/19/2022

Functional data clustering via information maximization

A new method for clustering functional data is proposed via information ...
research
06/07/2023

Interpretable Deep Clustering

Clustering is a fundamental learning task widely used as a first step in...
research
07/17/2020

OnlineAugment: Online Data Augmentation with Less Domain Knowledge

Data augmentation is one of the most important tools in training modern ...
research
07/08/2021

Augmented Data as an Auxiliary Plug-in Towards Categorization of Crowdsourced Heritage Data

In this paper, we propose a strategy to mitigate the problem of ineffici...
research
09/21/2022

Algorithm-Agnostic Interpretations for Clustering

A clustering outcome for high-dimensional data is typically interpreted ...
research
06/17/2020

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

Unsupervised image representations have significantly reduced the gap wi...
research
02/06/2013

An Information-Theoretic Analysis of Hard and Soft Assignment Methods for Clustering

Assignment methods are at the heart of many algorithms for unsupervised ...

Please sign up or login with your details

Forgot password? Click here to reset