A Simple and Effective Model-Based Variable Importance Measure

05/12/2018
by   Brandon M. Greenwell, et al.
0

In the era of "big data", it is becoming more of a challenge to not only build state-of-the-art predictive models, but also gain an understanding of what's really going on in the data. For example, it is often of interest to know which, if any, of the predictors in a fitted model are relatively influential on the predicted outcome. Some modern algorithms---like random forests and gradient boosted decision trees---have a natural way of quantifying the importance or relative influence of each feature. Other algorithms---like naive Bayes classifiers and support vector machines---are not capable of doing so and model-free approaches are generally used to measure each predictor's importance. In this paper, we propose a standardized, model-based approach to measuring predictor importance across the growing spectrum of supervised learning algorithms. Our proposed method is illustrated through both simulated and real data examples. The R code to reproduce all of the figures in this paper is available in the supplementary materials.

READ FULL TEXT

page 15

page 17

page 19

research
11/26/2015

Random Forests for Big Data

Big Data is one of the major challenges of statistical science and has n...
research
08/19/2020

Estimating the time-lapse between medical insurance reimbursement with non-parametric regression models

Non-parametric supervised learning algorithms represent a succinct class...
research
08/01/2022

Accelerated and interpretable oblique random survival forests

The oblique random survival forest (RSF) is an ensemble supervised learn...
research
03/03/2020

Understanding the Prediction Mechanism of Sentiments by XAI Visualization

People often rely on online reviews to make purchase decisions. The pres...
research
01/28/2019

Testing Conditional Predictive Independence in Supervised Learning Algorithms

We propose a general test of conditional independence. The conditional p...
research
03/04/2020

Unbiased variable importance for random forests

The default variable-importance measure in random Forests, Gini importan...
research
07/07/2000

Naive Bayes and Exemplar-Based approaches to Word Sense Disambiguation Revisited

This paper describes an experimental comparison between two standard sup...

Please sign up or login with your details

Forgot password? Click here to reset