A Comparison of Resampling and Recursive Partitioning Methods in Random Forest for Estimating the Asymptotic Variance Using the Infinitesimal Jackknife

06/19/2017
by   Cole Brokamp, et al.
0

The infinitesimal jackknife (IJ) has recently been applied to the random forest to estimate its prediction variance. These theorems were verified under a traditional random forest framework which uses classification and regression trees (CART) and bootstrap resampling. However, random forests using conditional inference (CI) trees and subsampling have been found to be not prone to variable selection bias. Here, we conduct simulation experiments using a novel approach to explore the applicability of the IJ to random forests using variations on the resampling method and base learner. Test data points were simulated and each trained using random forest on one hundred simulated training data sets using different combinations of resampling and base learners. Using CI trees instead of traditional CART trees as well as using subsampling instead of bootstrap sampling resulted in a much more accurate estimation of prediction variance when using the IJ. The random forest variations here have been incorporated into an open source software package for the R programming language.

READ FULL TEXT

page 7

page 8

page 9

research
03/21/2018

Boosting Random Forests to Reduce Bias; One-Step Boosted Forest and its Variance Estimate

In this paper we propose using the principle of boosting to reduce the b...
research
05/02/2014

Asymptotic Theory for Random Forests

Random forests have proven to be reliable predictive algorithms in many ...
research
05/17/2023

Optimal Weighted Random Forests

The random forest (RF) algorithm has become a very popular prediction me...
research
10/03/2021

Treeging

Treeging combines the flexible mean structure of regression trees with t...
research
12/19/2020

Achieving Reliable Causal Inference with Data-Mined Variables: A Random Forest Approach to the Measurement Error Problem

Combining machine learning with econometric analysis is becoming increas...
research
10/30/2014

A random forest system combination approach for error detection in digital dictionaries

When digitizing a print bilingual dictionary, whether via optical charac...
research
02/11/2023

Confidence and Uncertainty Assessment for Distributional Random Forests

The Distributional Random Forest (DRF) is a recently introduced Random F...

Please sign up or login with your details

Forgot password? Click here to reset