Framework for Evaluating Faithfulness of Local Explanations

02/01/2022
by   Sanjoy Dasgupta, et al.
5

We study the faithfulness of an explanation system to the underlying prediction model. We show that this can be captured by two properties, consistency and sufficiency, and introduce quantitative measures of the extent to which these hold. Interestingly, these measures depend on the test-time data distribution. For a variety of existing explanation systems, such as anchors, we analytically study these quantities. We also provide estimators and sample complexity bounds for empirically determining the faithfulness of black-box explanation systems. Finally, we experimentally validate the new properties and estimators.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2020

Representativity and Consistency Measures for Deep Neural Network Explanations

The adoption of machine learning in critical contexts requires a reliabl...
research
05/31/2019

Regularizing Black-box Models for Improved Interpretability (HILL 2019 Version)

Most of the work on interpretable machine learning has focused on design...
research
07/20/2020

Fairwashing Explanations with Off-Manifold Detergent

Explanation methods promise to make black-box classifiers more transpare...
research
06/21/2023

Evaluating the overall sensitivity of saliency-based explanation methods

We address the need to generate faithful explanations of "black box" Dee...
research
09/08/2021

Model Explanations via the Axiomatic Causal Lens

Explaining the decisions of black-box models has been a central theme in...
research
11/02/2020

A Learning Theoretic Perspective on Local Explainability

In this paper, we explore connections between interpretable machine lear...
research
04/06/2018

Estimation of geodesic tortuosity and constrictivity in stationary random closed sets

We investigate the problem of estimating geodesic tortuosity and constri...

Please sign up or login with your details

Forgot password? Click here to reset