Generalization Properties of Decision Trees on Real-valued and Categorical Features

10/18/2022
by   Jean-Samuel Leboeuf, et al.
0

We revisit binary decision trees from the perspective of partitions of the data. We introduce the notion of partitioning function, and we relate it to the growth function and to the VC dimension. We consider three types of features: real-valued, categorical ordinal and categorical nominal, with different split rules for each. For each feature type, we upper bound the partitioning function of the class of decision stumps before extending the bounds to the class of general decision tree (of any fixed structure) using a recursive approach. Using these new results, we are able to find the exact VC dimension of decision stumps on examples of ℓ real-valued features, which is given by the largest integer d such that 2ℓ≥d⌊d/2⌋. Furthermore, we show that the VC dimension of a binary tree structure with L_T leaves on examples of ℓ real-valued features is in O(L_T log(L_Tℓ)). Finally, we elaborate a pruning algorithm based on these results that performs better than the cost-complexity and reduced-error pruning algorithms on a number of data sets, with the advantage that no cross-validation is required.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2020

Decision trees as partitioning machines to characterize their generalization properties

Decision trees are popular machine learning models that are simple to bu...
research
12/01/2022

Fully-Dynamic Decision Trees

We develop the first fully dynamic algorithm that maintains a decision t...
research
02/03/2020

Evolutionary algorithms for constructing an ensemble of decision trees

Most decision tree induction algorithms are based on a greedy top-down r...
research
03/14/2018

SUSTain: Scalable Unsupervised Scoring for Tensors and its Application to Phenotyping

This paper presents a new method, which we call SUSTain, that extends re...
research
04/15/2020

Exploiting Categorical Structure Using Tree-Based Methods

Standard methods of using categorical variables as predictors either end...
research
06/03/2011

An Analysis of Reduced Error Pruning

Top-down induction of decision trees has been observed to suffer from th...
research
03/01/2010

Further Exploration of the Dendritic Cell Algorithm: Antigen Multiplier and Time Windows

As an immune-inspired algorithm, the Dendritic Cell Algorithm (DCA), pro...

Please sign up or login with your details

Forgot password? Click here to reset