Explaining Classification Models Built on High-Dimensional Sparse Data

07/21/2016
by   Julie Moeyersoms, et al.
0

Predictive modeling applications increasingly use data representing people's behavior, opinions, and interactions. Fine-grained behavior data often has different structure from traditional data, being very high-dimensional and sparse. Models built from these data are quite difficult to interpret, since they contain many thousands or even many millions of features. Listing features with large model coefficients is not sufficient, because the model coefficients do not incorporate information on feature presence, which is key when analysing sparse data. In this paper we introduce two alternatives for explaining predictive models by listing important features. We evaluate these alternatives in terms of explanation "bang for the buck,", i.e., how many examples' inferences are explained for a given number of features listed. The bottom line: (i) The proposed alternatives have double the bang-for-the-buck as compared to just listing the high-coefficient features, and (ii) interestingly, although they come from different sources and motivations, the two new alternatives provide strikingly similar rankings of important features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/23/2023

Take 5: Interpretable Image Classification with a Handful of Features

Deep Neural Networks use thousands of mostly incomprehensible features t...
research
04/29/2021

Loss-Based Variational Bayes Prediction

We propose a new method for Bayesian prediction that caters for models w...
research
04/07/2023

Expectations over Unspoken Alternatives Predict Pragmatic Inferences

Scalar inferences (SI) are a signature example of how humans interpret l...
research
09/10/2022

Explaining Results of Multi-Criteria Decision Making

We introduce a method for explaining the results of various linear and h...
research
09/02/2021

Inferring feature importance with uncertainties in high-dimensional data

Estimating feature importance is a significant aspect of explaining data...
research
07/05/2021

Sufficient principal component regression for pattern discovery in transcriptomic data

Methods for global measurement of transcript abundance such as microarra...

Please sign up or login with your details

Forgot password? Click here to reset