Nonparametric Variable Screening with Optimal Decision Stumps

11/05/2020
by   Jason M. Klusowski, et al.
0

Decision trees and their ensembles are endowed with a rich set of diagnostic tools for ranking and screening input variables in a predictive model. One of the most commonly used in practice is the Mean Decrease in Impurity (MDI), which calculates an importance score for a variable by summing the weighted impurity reductions over all non-terminal nodes split with that variable. Despite the widespread use of tree based variable importance measures such as MDI, pinning down their theoretical properties has been challenging and therefore largely unexplored. To address this gap between theory and practice, we derive rigorous finite sample performance guarantees for variable ranking and selection in nonparametric models with MDI for a single-level CART decision tree (decision stump). We find that the marginal signal strength of each variable and ambient dimensionality can be considerably weaker and higher, respectively, than state-of-the-art nonparametric variable selection methods. Furthermore, unlike previous marginal screening methods that attempt to directly estimate each marginal projection via a truncated basis expansion, the fitted model used here is a simple, parsimonious decision stump, thereby eliminating the need for tuning the number of basis terms. Thus, surprisingly, even though decision stumps are highly inaccurate for estimation purposes, they can still be used to perform consistent model selection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2018

ABC Variable Selection with Bayesian Forests

Few problems in statistics are as perplexing as variable selection in th...
research
08/22/2023

Nonparametric Assessment of Variable Selection and Ranking Algorithms

Selecting from or ranking a set of candidates variables in terms of thei...
research
08/01/2018

A Theory of Dichotomous Valuation with Applications to Variable Selection

An econometric or statistical model may undergo a marginal gain when a n...
research
06/24/2019

Best Split Nodes for Regression Trees

Decision trees with binary splits are popularly constructed using Classi...
research
04/15/2022

Towards a Unified Framework for Uncertainty-aware Nonlinear Variable Selection with Theoretical Guarantees

We develop a simple and unified framework for nonlinear variable selecti...
research
03/08/2023

Optimal Sparse Recovery with Decision Stumps

Decision trees are widely used for their low computational cost, good pr...
research
03/12/2019

Unbiased Measurement of Feature Importance in Tree-Based Methods

We propose a modification that corrects for split-improvement variable i...

Please sign up or login with your details

Forgot password? Click here to reset