Asymptotics for The k-means

11/18/2022
by   Tonglin Zhang, et al.
0

The k-means is one of the most important unsupervised learning techniques in statistics and computer science. The goal is to partition a data set into many clusters, such that observations within clusters are the most homogeneous and observations between clusters are the most heterogeneous. Although it is well known, the investigation of the asymptotic properties is far behind, leading to difficulties in developing more precise k-means methods in practice. To address this issue, a new concept called clustering consistency is proposed. Fundamentally, the proposed clustering consistency is more appropriate than the previous criterion consistency for the clustering methods. Using this concept, a new k-means method is proposed. It is found that the proposed k-means method has lower clustering error rates and is more robust to small clusters and outliers than existing k-means methods. When k is unknown, using the Gap statistics, the proposed method can also identify the number of clusters. This is rarely achieved by existing k-means methods adopted by many software packages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2020

Unsupervised K-Means Clustering Algorithm

The k-means algorithm is generally the most known and used clustering me...
research
05/22/2017

Improved Clustering with Augmented k-means

Identifying a set of homogeneous clusters in a heterogeneous dataset is ...
research
07/16/2018

Novel Feature-Based Clustering of Micro-Panel Data (CluMP)

Micro-panel data are collected and analysed in many research and industr...
research
12/23/2022

Stop using the elbow criterion for k-means and how to choose the number of clusters instead

A major challenge when using k-means clustering often is how to choose t...
research
08/16/2021

Robust Trimmed k-means

Clustering is a fundamental tool in unsupervised learning, used to group...
research
04/05/2019

k-means clustering of extremes

The k-means clustering algorithm and its variant, the spherical k-means ...
research
04/04/2022

Multivariate Microaggregation of Set-Valued Data

Data controllers manage immense data, and occasionally, it is released p...

Please sign up or login with your details

Forgot password? Click here to reset