Learning Optimal Classification Trees: Strong Max-Flow Formulations

02/21/2020
by   Sina Aghaei, et al.
12

We consider the problem of learning optimal binary classification trees. Literature on the topic has burgeoned in recent years, motivated both by the empirical suboptimality of heuristic approaches and the tremendous improvements in mixed-integer programming (MIP) technology. Yet, existing approaches from the literature do not leverage the power of MIP to its full extent. Indeed, they rely on weak formulations, resulting in slow convergence and large optimality gaps. To fill this gap in the literature, we propose a flow-based MIP formulation for optimal binary classification trees that has a stronger linear programming relaxation. Our formulation presents an attractive decomposable structure. We exploit this structure and max-flow/min-cut duality to derive a Benders' decomposition method, which scales to larger instances. We conduct extensive computational experiments on standard benchmark datasets on which we show that our proposed approaches are 50 times faster than state-of-the art MIP-based techniques and improve out of sample performance up to 13.8

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2021

Strong Optimal Classification Trees

Decision trees are among the most popular machine learning models and ar...
research
06/10/2022

Mixed integer linear optimization formulations for learning optimal binary classification trees

Decision trees are powerful tools for classification and regression that...
research
01/09/2022

A Knapsack Intersection Hierarchy Applied to All-or-Nothing Flow in Trees

We introduce a natural knapsack intersection hierarchy for strengthening...
research
04/21/2023

Rolling Lookahead Learning for Optimal Classification Trees

Classification trees continue to be widely adopted in machine learning a...
research
08/22/2023

A Tight Formulation for the Dial-a-Ride Problem

Ridepooling services play an increasingly important role in modern trans...
research
03/22/2023

Benders decomposition algorithms for minimizing the spread of harmful contagions in networks

The COVID-19 pandemic has been a recent example for the spread of a harm...
research
07/12/2023

Outlier detection in regression: conic quadratic formulations

In many applications, when building linear regression models, it is impo...

Please sign up or login with your details

Forgot password? Click here to reset