Interpretable Clustering via Optimal Trees

12/03/2018
by   Dimitris Bertsimas, et al.
0

State-of-the-art clustering algorithms use heuristics to partition the feature space and provide little insight into the rationale for cluster membership, limiting their interpretability. In healthcare applications, the latter poses a barrier to the adoption of these methods since medical researchers are required to provide detailed explanations of their decisions in order to gain patient trust and limit liability. We present a new unsupervised learning algorithm that leverages Mixed Integer Optimization techniques to generate interpretable tree-based clustering models. Utilizing the flexible framework of Optimal Trees, our method approximates the globally optimal solution leading to high quality partitions of the feature space. Our algorithm, can incorporate various internal validation metrics, naturally determines the optimal number of clusters, and is able to account for mixed numeric and categorical data. It achieves comparable or superior performance on both synthetic and real world datasets when compared to K-Means while offering significantly higher interpretability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/10/2021

Interpretable Clustering via Multi-Polytope Machines

Clustering is a popular unsupervised learning tool often used to discove...
research
01/30/2023

Optimal Decision Trees For Interpretable Clustering with Constraints

Constrained clustering is a semi-supervised task that employs a limited ...
research
12/08/2020

Optimal Survival Trees

Tree-based models are increasingly popular due to their ability to ident...
research
10/19/2022

Margin Optimal Classification Trees

In recent years there has been growing attention to interpretable machin...
research
05/07/2023

A Generalized Framework for Predictive Clustering and Optimization

Clustering is a powerful and extensively used data science tool. While c...
research
10/18/2022

Clustering Categorical Data: Soft Rounding k-modes

Over the last three decades, researchers have intensively explored vario...
research
10/11/2021

Density-based interpretable hypercube region partitioning for mixed numeric and categorical data

Consider a structured dataset of features, such as {SEX, INCOME, RACE, E...

Please sign up or login with your details

Forgot password? Click here to reset