(Decision and regression) tree ensemble based kernels for regression and classification

by   Dai Feng, et al.

Tree based ensembles such as Breiman's random forest (RF) and Gradient Boosted Trees (GBT) can be interpreted as implicit kernel generators, where the ensuing proximity matrix represents the data-driven tree ensemble kernel. Kernel perspective on the RF has been used to develop a principled framework for theoretical investigation of its statistical properties. Recently, it has been shown that the kernel interpretation is germane to other tree-based ensembles e.g. GBTs. However, practical utility of the links between kernels and the tree ensembles has not been widely explored and systematically evaluated. Focus of our work is investigation of the interplay between kernel methods and the tree based ensembles including the RF and GBT. We elucidate the performance and properties of the RF and GBT based kernels in a comprehensive simulation study comprising of continuous and binary targets. We show that for continuous targets, the RF/GBT kernels are competitive to their respective ensembles in higher dimensional scenarios, particularly in cases with larger number of noisy features. For the binary target, the RF/GBT kernels and their respective ensembles exhibit comparable performance. We provide the results from real life data sets for regression and classification to show how these insights may be leveraged in practice. Overall, our results support the tree ensemble based kernels as a valuable addition to the practitioner's toolbox. Finally, we discuss extensions of the tree ensemble based kernels for survival targets, interpretable prototype and landmarking classification and regression. We outline future line of research for kernels furnished by Bayesian counterparts of the frequentist tree ensembles.


page 1

page 4

page 11

page 14

page 17

page 18

page 19

page 20


Random Forest (RF) Kernel for Regression, Classification and Survival

Breiman's random forest (RF) can be interpreted as an implicit kernel ge...

A Framework for an Assessment of the Kernel-target Alignment in Tree Ensemble Kernel Learning

Kernels ensuing from tree ensembles such as random forest (RF) or gradie...

Making Tree Ensembles Interpretable

Tree ensembles, such as random forest and boosted trees, are renowned fo...

A Neural Tangent Kernel Perspective of Infinite Tree Ensembles

In practical situations, the ensemble tree model is one of the most popu...

Extended L-ensembles: a new representation for Determinantal Point Processes

Determinantal point processes (DPPs) are a class of repulsive point proc...

To Bag is to Prune

It is notoriously hard to build a bad Random Forest (RF). Concurrently, ...

Decision Forests Induce Characteristic Kernels

Decision forests are popular tools for classification and regression. Th...

Please sign up or login with your details

Forgot password? Click here to reset