A new distance measurement and its application in K-Means Algorithm

06/10/2022
by   Yiqun Zhang, et al.
0

K-Means clustering algorithm is one of the most commonly used clustering algorithms because of its simplicity and efficiency. K-Means clustering algorithm based on Euclidean distance only pays attention to the linear distance between samples, but ignores the overall distribution structure of the dataset (i.e. the fluid structure of dataset). Since it is difficult to describe the internal structure of two data points by Euclidean distance in high-dimensional data space, we propose a new distance measurement, namely, view-distance, and apply it to the K-Means algorithm. On the classical manifold learning datasets, S-curve and Swiss roll datasets, not only this new distance can cluster the data according to the structure of the data itself, but also the boundaries between categories are neat dividing lines. Moreover, we also tested the classification accuracy and clustering effect of the K-Means algorithm based on view-distance on some real-world datasets. The experimental results show that, on most datasets, the K-Means algorithm based on view-distance has a certain degree of improvement in classification accuracy and clustering effect.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2015

New HSL Distance Based Colour Clustering Algorithm

In this paper, we define a distance for the HSL colour system. Next, the...
research
01/07/2022

Probabilistic spatial clustering based on the Self Discipline Learning (SDL) model of autonomous learning

Unsupervised clustering algorithm can effectively reduce the dimension o...
research
02/20/2022

Clustering by the Probability Distributions from Extreme Value Theory

Clustering is an essential task to unsupervised learning. It tries to au...
research
08/07/2023

Wide Gaps and Clustering Axioms

The widely applied k-means algorithm produces clusterings that violate o...
research
11/27/2018

Tackling Early Sparse Gradients in Softmax Activation Using Leaky Squared Euclidean Distance

Softmax activation is commonly used to output the probability distributi...
research
01/01/2021

A Multi-disciplinary Ensemble Algorithm for Clustering Heterogeneous Datasets

Clustering is a commonly used method for exploring and analysing data wh...
research
03/01/2020

Statistical power for cluster analysis

Cluster algorithms are gaining in popularity due to their compelling abi...

Please sign up or login with your details

Forgot password? Click here to reset