Consistent Individualized Feature Attribution for Tree Ensembles

02/12/2018
by   Scott M. Lundberg, et al.
0

Interpreting predictions from tree ensemble methods such as gradient boosting machines and random forests is important, yet feature attribution for trees is often heuristic and not individualized for each prediction. Here we show that popular feature attribution methods are inconsistent, meaning they can lower a feature's assigned importance when the true impact of that feature actually increases. This is a fundamental problem that casts doubt on any comparison between features. To address it we turn to recent applications of game theory and develop fast exact tree solutions for SHAP (SHapley Additive exPlanation) values, which are the unique consistent and locally accurate attribution values. We then extend SHAP values to interaction effects and define SHAP interaction values. We propose a rich visualization of individualized feature attributions that improves over classic attribution summaries and partial dependence plots, and a unique "supervised" clustering (clustering based on feature attributions). We demonstrate better agreement with human intuition through a user study, exponential improvements in run time, improved clustering performance, and better identification of influential features. An implementation of our algorithm has also been merged into XGBoost and LightGBM, see http://github.com/slundberg/shap for details.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2017

Consistent feature attribution for tree ensembles

It is critical in many applications to understand what features are impo...
research
12/16/2021

Exact Shapley Values for Local and Model-True Explanations of Decision Tree Ensembles

Additive feature explanations using Shapley values have become popular f...
research
11/08/2022

Individualized and Global Feature Attributions for Gradient Boosted Trees in the Presence of ℓ_2 Regularization

While ℓ_2 regularization is widely used in training gradient boosted tre...
research
10/18/2021

RKHS-SHAP: Shapley Values for Kernel Methods

Feature attribution for kernel methods is often heuristic and not indivi...
research
05/18/2019

Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees

Tree ensembles, such as random forests and AdaBoost, are ubiquitous mach...
research
02/16/2023

On marginal feature attributions of tree-based models

Due to their power and ease of use, tree-based machine learning models h...
research
12/17/2020

Predicting Events in MOBA Games: Dataset, Attribution, and Evaluation

The multiplayer online battle arena (MOBA) games have become increasingl...

Please sign up or login with your details

Forgot password? Click here to reset