Consistency of The Oblique Decision Tree and Its Random Forest

by   Haoran Zhan, et al.

The classification and regression tree (CART) and Random Forest (RF) are arguably the most popular pair of statistical learning methods. However, their statistical consistency can only be proved under very restrictive assumption on the underlying regression function. As an extension of the standard CART, Breiman (1984) suggested using linear combinations of predictors as splitting variables. The method became known as the oblique decision tree (ODT) and has received lots of attention. ODT tends to perform better than CART and requires fewer partitions. In this paper, we further show that ODT is consistent for very general regression functions as long as they are continuous. We also prove the consistency of ODT-based random forests (ODRF) that uses either fixed-size or random-size subset of features in the features bagging, the latter of which is also guaranteed to be consistent for general regression functions, but the former is consistent only for functions with specific structures. After refining the existing computer packages according to the established theory, our numerical experiments also show that ODRF has a noticeable overall improvement over RF and other decision forests.


page 1

page 2

page 3

page 4


Multinomial Random Forests: Fill the Gap between Theoretical Consistency and Empirical Soundness

Random forests (RF) are one of the most widely used ensemble learning me...

An Approximation Method for Fitted Random Forests

Random Forests (RF) is a popular machine learning method for classificat...

Ensemble Projection Pursuit for General Nonparametric Regression

The projection pursuit regression (PPR) has played an important role in ...

Random Forests for dependent data

Random forest (RF) is one of the most popular methods for estimating reg...

Regression-Enhanced Random Forests

Random forest (RF) methodology is one of the most popular machine learni...

Is interpolation benign for random forests?

Statistical wisdom suggests that very complex models, interpolating trai...

AMF: Aggregated Mondrian Forests for Online Learning

Random Forests (RF) is one of the algorithms of choice in many supervise...

Please sign up or login with your details

Forgot password? Click here to reset