Analysis of Ward's Method

07/11/2019
by   Anna Großwendt, et al.
0

We study Ward's method for the hierarchical k-means problem. This popular greedy heuristic is based on the complete linkage paradigm: Starting with all data points as singleton clusters, it successively merges two clusters to form a clustering with one cluster less. The pair of clusters is chosen to (locally) minimize the k-means cost of the clustering in the next step. Complete linkage algorithms are very popular for hierarchical clustering problems, yet their theoretical properties have been studied relatively little. For the Euclidean k-center problem, Ackermann et al. show that the k-clustering in the hierarchy computed by complete linkage has a worst-case approximation ratio of Θ(log k). If the data lies in R^d for constant dimension d, the guarantee improves to O(1), but the O-notation hides a linear dependence on d. Complete linkage for k-median or k-means has not been analyzed so far. In this paper, we show that Ward's method computes a 2-approximation with respect to the k-means objective function if the optimal k-clustering is well separated. If additionally the optimal clustering also satisfies a balance condition, then Ward's method fully recovers the optimum solution. These results hold in arbitrary dimension. We accompany our positive results with a lower bound of Ω((3/2)^d) for data sets in R^d that holds if no separation is guaranteed, and with lower bounds when the guaranteed separation is not sufficiently strong. Finally, we show that Ward produces an O(1)-approximative clustering for one-dimensional data sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2020

Explainable k-Means and k-Medians Clustering

Clustering is a popular form of unsupervised learning for geometric data...
research
12/19/2017

Approximate Correlation Clustering Using Same-Cluster Queries

Ashtiani et al. (NIPS 2016) introduced a semi-supervised framework for c...
research
09/07/2020

Achieving anonymity via weak lower bound constraints for k-median and k-means

We study k-clustering problems with lower bounds, including k-median and...
research
05/03/2022

The Price of Hierarchical Clustering

Hierarchical Clustering is a popular tool for understanding the heredita...
research
06/09/2021

On Clusters that are Separated but Large

Given a set P of n points in ^d, consider the problem of computing k sub...
research
08/19/2013

A balanced k-means algorithm for weighted point sets

The classical k-means algorithm for partitioning n points in R^d into k ...
research
06/07/2019

Benchmarking Minimax Linkage

Minimax linkage was first introduced by Ao et al. [3] in 2004, as an alt...

Please sign up or login with your details

Forgot password? Click here to reset