Hierarchical Clustering using Randomly Selected Similarities

07/19/2012
by   Brian Eriksson, et al.
0

The problem of hierarchical clustering items from pairwise similarities is found across various scientific disciplines, from biology to networking. Often, applications of clustering techniques are limited by the cost of obtaining similarities between pairs of items. While prior work has been developed to reconstruct clustering using a significantly reduced set of pairwise similarities via adaptive measurements, these techniques are only applicable when choice of similarities are available to the user. In this paper, we examine reconstructing hierarchical clustering under similarity observations at-random. We derive precise bounds which show that a significant fraction of the hierarchical clustering can be recovered using fewer than all the pairwise similarities. We find that the correct hierarchical clustering down to a constant fraction of the total number of items (i.e., clusters sized O(N)) can be found using only O(N log N) randomly selected pairwise similarities in expectation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2011

Active Clustering: Robust and Efficient Hierarchical Clustering using Adaptively Selected Similarities

Hierarchical clustering based on pairwise similarities is a common tool ...
research
10/16/2015

A cost function for similarity-based hierarchical clustering

The development of algorithms for hierarchical clustering has been hampe...
research
02/20/2023

Active Learning with Positive and Negative Pairwise Feedback

In this paper, we propose a generic framework for active clustering with...
research
10/25/2021

Shift of Pairwise Similarities for Data Clustering

Several clustering methods (e.g., Normalized Cut and Ratio Cut) divide t...
research
11/29/2022

A Revenue Function for Comparison-Based Hierarchical Clustering

Comparison-based learning addresses the problem of learning when, instea...
research
05/20/2016

Fast Randomized Semi-Supervised Clustering

We consider the problem of clustering partially labeled data from a mini...
research
03/10/2023

Hierarchical Clustering with OWA-based Linkages, the Lance-Williams Formula, and Dendrogram Inversions

Agglomerative hierarchical clustering based on Ordered Weighted Averagin...

Please sign up or login with your details

Forgot password? Click here to reset