SMLSOM: The shrinking maximum likelihood self-organizing map

by   Ryosuke Motegi, et al.

Determining the number of clusters in a dataset is a fundamental issue in data clustering. Many methods have been proposed to solve the problem of selecting the number of clusters, considering it to be a problem with regard to model selection. This paper proposes a greedy algorithm that automatically selects a suitable number of clusters based on a probability distribution model framework. The algorithm includes two components. First, a generalization of Kohonen's self-organizing map (SOM), which has nodes linked to a probability distribution model, and which enables the algorithm to search for the winner based on the likelihood of each node, is introduced. Second, the proposed method uses a graph structure and a neighbor defined by the length of the shortest path between nodes, in contrast to Kohonen's SOM in which the nodes are fixed in the Euclidean space. This implementation makes it possible to update its graph structure by cutting links to weakly connected nodes to avoid unnecessary node deletion. The weakness of a node connection is measured using the Kullback–Leibler divergence and the redundancy of a node is measured by the minimum description length (MDL). This updating step makes it easy to determine the suitable number of clusters. Compared with existing methods, our proposed method is computationally efficient and can accurately select the number of clusters and perform clustering.


page 1

page 2

page 3

page 4


Refining a k-nearest neighbor graph for a computationally efficient spectral clustering

Spectral clustering became a popular choice for data clustering for its ...

Beyond the shortest path: the path length index as a distribution

The traditional complex network approach considers only the shortest pat...

Model-Based Hierarchical Clustering

We present an approach to model-based hierarchical clustering by formula...

Identifying the number of clusters for K-Means: A hypersphere density based approach

Application of K-Means algorithm is restricted by the fact that the numb...

Advice Complexity bounds for Online Delayed F-Node-, H-Node- and H-Edge-Deletion Problems

Let F be a fixed finite obstruction set of graphs and G be a graph revea...

Discovering the Graph Structure in the Clustering Results

In a standard cluster analysis, such as k-means, in addition to clusters...

Generalizing Lloyd's algorithm for graph clustering

Clustering is a commonplace problem in many areas of data science, with ...

Please sign up or login with your details

Forgot password? Click here to reset