Consistent Estimation of Residual Variance with Random Forest Out-Of-Bag Errors
The issue of estimating residual variance in regression models has experienced relatively little attention in the machine learning community. However, the estimate is of primary interest in many practical applications, e.g. as a primary step towards the construction of prediction intervals. Here, we consider this issue for the random forest. Therein, the functional relationship between covariates and response variable is modeled by a weighted sum of the latter. The dependence structure is, however, involved in the weights that are constructed during the tree construction process making the model complex in mathematical analysis. Restricting to L2-consistent random forest models, we provide random forest based residual variance estimators and prove their consistency.
READ FULL TEXT