Semi-supervised K-means++

by   Jordan Yoder, et al.
Johns Hopkins University

Traditionally, practitioners initialize the k-means algorithm with centers chosen uniformly at random. Randomized initialization with uneven weights ( k-means++) has recently been used to improve the performance over this strategy in cost and run-time. We consider the k-means problem with semi-supervised information, where some of the data are pre-labeled, and we seek to label the rest according to the minimum cost solution. By extending the k-means++ algorithm and analysis to account for the labels, we derive an improved theoretical bound on expected cost and observe improved performance in simulated and real data examples. This analysis provides theoretical justification for a roughly linear semi-supervised clustering algorithm.


page 10

page 11


A semi-supervised sparse K-Means algorithm

We consider the problem of data clustering with unidentified feature qua...

Quantum Semi-Supervised Learning with Quantum Supremacy

Quantum machine learning promises to efficiently solve important problem...

An Exact Algorithm for Semi-supervised Minimum Sum-of-Squares Clustering

The minimum sum-of-squares clustering (MSSC), or k-means type clustering...

A Simple Algorithm for Semi-supervised Learning with Improved Generalization Error Bound

In this work, we develop a simple algorithm for semi-supervised regressi...

Semi-Supervised Information-Maximization Clustering

Semi-supervised clustering aims to introduce prior knowledge in the deci...

Semi-Supervised Cluster Extraction via a Compressive Sensing Approach

We use techniques from compressive sensing to design a local clustering ...

Improved Learning-augmented Algorithms for k-means and k-medians Clustering

We consider the problem of clustering in the learning-augmented setting,...

Please sign up or login with your details

Forgot password? Click here to reset