Fair Algorithms for Hierarchical Agglomerative Clustering

05/07/2020
by   Anshuman Chhabra, et al.
0

Hierarchical Agglomerative Clustering (HAC) algorithms are extensively utilized in modern data science and machine learning, and seek to partition the dataset into clusters while generating a hierarchical relationship between the data samples themselves. HAC algorithms are employed in a number of applications, such as biology, natural language processing, and recommender systems. Thus, it is imperative to ensure that these algorithms are fair– even if the dataset contains biases against certain protected groups, the cluster outputs generated should not be discriminatory against samples from any of these groups. However, recent work in clustering fairness has mostly focused on center-based clustering algorithms, such as k-median and k-means clustering. Therefore, in this paper, we propose fair algorithms for performing HAC that enforce fairness constraints 1) irrespective of the distance linkage criteria used, 2) generalize to any natural measures of clustering fairness for HAC, 3) work for multiple protected groups, and 4) have competitive running times to vanilla HAC. To the best of our knowledge, this is the first work that studies fairness for HAC algorithms. We also propose an algorithm with lower asymptotic time complexity than HAC algorithms that can rectify existing HAC outputs and make them subsequently fair as a result. Moreover, we carry out extensive experiments on multiple real-world UCI datasets to demonstrate the working of our algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2021

Fair Clustering Using Antidote Data

Clustering algorithms are widely utilized for many modern data science a...
research
01/29/2019

Towards Fair Deep Clustering With Multi-State Protected Variables

Fair clustering under the disparate impact doctrine requires that popula...
research
06/25/2021

Fairness Deconstructed: A Sociotechnical View of 'Fair' Algorithms in Criminal Justice

Early studies of risk assessment algorithms used in criminal justice rev...
research
08/22/2022

Socially Fair Center-based and Linear Subspace Clustering

Center-based clustering (e.g., k-means, k-medians) and clustering using ...
research
02/06/2023

Fair Minimum Representation Clustering

Clustering is an unsupervised learning task that aims to partition data ...
research
10/22/2021

Fairness Degrading Adversarial Attacks Against Clustering Algorithms

Clustering algorithms are ubiquitous in modern data science pipelines, a...
research
10/04/2022

Robust Fair Clustering: A Novel Fairness Attack and Defense Framework

Clustering algorithms are widely used in many societal resource allocati...

Please sign up or login with your details

Forgot password? Click here to reset