Generalized Optimal Sparse Decision Trees

06/15/2020
by   Jimmy Lin, et al.
9

Decision tree optimization is notoriously difficult from a computational perspective but essential for the field of interpretable machine learning. Despite efforts over the past 40 years, only recently have optimization breakthroughs been made that have allowed practical algorithms to find optimal decision trees. These new techniques have the potential to trigger a paradigm shift where it is possible to construct sparse decision trees to efficiently optimize a variety of objective functions without relying on greedy splitting and pruning heuristics that often lead to suboptimal solutions. The contribution in this work is to provide a general framework for decision tree optimization that addresses the two significant open problems in the area: treatment of imbalanced data and fully optimizing over continuous variables. We present techniques that produce optimal decision trees over a variety of objectives including F-score, AUC, and partial area under the ROC convex hull. We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables and speeds up decision tree construction by several orders of magnitude relative to the state-of-the art.

READ FULL TEXT
research
04/29/2019

Optimal Sparse Decision Trees

Decision tree algorithms have been among the most popular algorithms for...
research
12/01/2021

How Smart Guessing Strategies Can Yield Massive Scalability Improvements for Sparse Decision Tree Optimization

Sparse decision tree optimization has been one of the most fundamental p...
research
10/13/2022

Fast Optimization of Weighted Sparse Decision Trees for use in Optimal Treatment Regimes and Optimal Policy Design

Sparse decision trees are one of the most common forms of interpretable ...
research
10/07/2021

Coresets for Decision Trees of Signals

A k-decision tree t (or k-tree) is a recursive partition of a matrix (2D...
research
02/27/2019

Neural Packet Classification

Packet classification is a fundamental problem in computer networking. T...
research
10/19/2021

Optimal randomized classification trees

Classification and Regression Trees (CARTs) are off-the-shelf techniques...
research
05/19/2018

Adaptively Pruning Features for Boosted Decision Trees

Boosted decision trees enjoy popularity in a variety of applications; ho...

Please sign up or login with your details

Forgot password? Click here to reset