Fair Algorithms for Hierarchical Agglomerative Clustering

by   Anshuman Chhabra, et al.

Hierarchical Agglomerative Clustering (HAC) algorithms are extensively utilized in modern data science and machine learning, and seek to partition the dataset into clusters while generating a hierarchical relationship between the data samples themselves. HAC algorithms are employed in a number of applications, such as biology, natural language processing, and recommender systems. Thus, it is imperative to ensure that these algorithms are fair– even if the dataset contains biases against certain protected groups, the cluster outputs generated should not be discriminatory against samples from any of these groups. However, recent work in clustering fairness has mostly focused on center-based clustering algorithms, such as k-median and k-means clustering. Therefore, in this paper, we propose fair algorithms for performing HAC that enforce fairness constraints 1) irrespective of the distance linkage criteria used, 2) generalize to any natural measures of clustering fairness for HAC, 3) work for multiple protected groups, and 4) have competitive running times to vanilla HAC. To the best of our knowledge, this is the first work that studies fairness for HAC algorithms. We also propose an algorithm with lower asymptotic time complexity than HAC algorithms that can rectify existing HAC outputs and make them subsequently fair as a result. Moreover, we carry out extensive experiments on multiple real-world UCI datasets to demonstrate the working of our algorithms.


page 1

page 2

page 3

page 4


Fair Clustering Using Antidote Data

Clustering algorithms are widely utilized for many modern data science a...

Towards Fair Deep Clustering With Multi-State Protected Variables

Fair clustering under the disparate impact doctrine requires that popula...

Fairness Deconstructed: A Sociotechnical View of 'Fair' Algorithms in Criminal Justice

Early studies of risk assessment algorithms used in criminal justice rev...

Socially Fair Center-based and Linear Subspace Clustering

Center-based clustering (e.g., k-means, k-medians) and clustering using ...

Fair Minimum Representation Clustering

Clustering is an unsupervised learning task that aims to partition data ...

Fairness Degrading Adversarial Attacks Against Clustering Algorithms

Clustering algorithms are ubiquitous in modern data science pipelines, a...

Robust Fair Clustering: A Novel Fairness Attack and Defense Framework

Clustering algorithms are widely used in many societal resource allocati...

Please sign up or login with your details

Forgot password? Click here to reset