Debugging Tests for Model Explanations

11/10/2020
by   Julius Adebayo, et al.
10

We investigate whether post-hoc model explanations are effective for diagnosing model errors–model debugging. In response to the challenge of explaining a model's prediction, a vast array of explanation methods have been proposed. Despite increasing use, it is unclear if they are effective. To start, we categorize bugs, based on their source, into: data, model, and test-time contamination bugs. For several explanation methods, we assess their ability to: detect spurious correlation artifacts (data contamination), diagnose mislabeled training examples (data contamination), differentiate between a (partially) re-initialized model and a trained one (model contamination), and detect out-of-distribution inputs (test-time contamination). We find that the methods tested are able to diagnose a spurious background bug, but not conclusively identify mislabeled training examples. In addition, a class of methods, that modify the back-propagation algorithm are invariant to the higher layer parameters of a deep network; hence, ineffective for diagnosing model contamination. We complement our analysis with a human subject study, and find that subjects fail to identify defective models using attributions, but instead rely, primarily, on model predictions. Taken together, our results provide guidance for practitioners and researchers turning to explanations as tools for model debugging.

READ FULL TEXT

page 2

page 4

page 7

page 8

page 20

page 21

page 25

page 35

research
12/09/2022

Post hoc Explanations may be Ineffective for Detecting Unknown Spurious Correlation

We investigate whether three types of post hoc model explanations–featur...
research
07/19/2021

On the Veracity of Local, Model-agnostic Explanations in Audio Classification: Targeted Investigations with Adversarial Examples

Local explanation methods such as LIME have become popular in MIR as too...
research
12/08/2022

Explaining Software Bugs Leveraging Code Structures in Neural Machine Translation

Software bugs claim approximately 50 economy billions of dollars. Once a...
research
11/08/2021

Revisiting Methods for Finding Influential Examples

Several instance-based explainability methods for finding influential tr...
research
07/23/2023

Right for the Wrong Reason: Can Interpretable ML Techniques Detect Spurious Correlations?

While deep neural network models offer unmatched classification performa...
research
08/27/2021

This looks more like that: Enhancing Self-Explaining Models by Prototypical Relevance Propagation

Current machine learning models have shown high efficiency in solving a ...
research
07/29/2023

Explaining Full-disk Deep Learning Model for Solar Flare Prediction using Attribution Methods

This paper contributes to the growing body of research on deep learning ...

Please sign up or login with your details

Forgot password? Click here to reset