Applying Semi-Automated Hyperparameter Tuning for Clustering Algorithms

08/25/2021
by   Elizabeth Ditton, et al.
James Cook University
0

When approaching a clustering problem, choosing the right clustering algorithm and parameters is essential, as each clustering algorithm is proficient at finding clusters of a particular nature. Due to the unsupervised nature of clustering algorithms, there are no ground truth values available for empirical evaluation, which makes automation of the parameter selection process through hyperparameter tuning difficult. Previous approaches to hyperparameter tuning for clustering algorithms have relied on internal metrics, which are often biased towards certain algorithms, or having some ground truth labels available, moving the problem into the semi-supervised space. This preliminary study proposes a framework for semi-automated hyperparameter tuning of clustering problems, using a grid search to develop a series of graphs and easy to interpret metrics that can then be used for more efficient domain-specific evaluation. Preliminary results show that internal metrics are unable to capture the semantic quality of the clusters developed and approaches driven by internal metrics would come to different conclusions than those driven by manual evaluation.

READ FULL TEXT

page 1

page 2

page 3

11/24/2020

Automatic Clustering for Unsupervised Risk Diagnosis of Vehicle Driving for Smart Road

Early risk diagnosis and driving anomaly detection from vehicle stream a...
06/18/2020

Multi-Source Unsupervised Hyperparameter Optimization

How can we conduct efficient hyperparameter optimization for a completel...
04/08/2019

CRAD: Clustering with Robust Autocuts and Depth

We develop a new density-based clustering algorithm named CRAD which is ...
04/30/2013

Semi-Supervised Information-Maximization Clustering

Semi-supervised clustering aims to introduce prior knowledge in the deci...
09/23/2021

A Framework for Cluster and Classifier Evaluation in the Absence of Reference Labels

In some problem spaces, the high cost of obtaining ground truth labels n...
06/24/2020

Off-the-grid: Fast and Effective Hyperparameter Search for Kernel Clustering

Kernel functions are a powerful tool to enhance the k-means clustering a...
03/19/2020

Clustering with Fast, Automated and Reproducible assessment applied to longitudinal neural tracking

Across many areas, from neural tracking to database entity resolution, m...

Please sign up or login with your details

Forgot password? Click here to reset