GANMEX: One-vs-One Attributions using GAN-based Model Explainability

11/11/2020
by   Sheng-Min Shih, et al.
8

Attribution methods have been shown as promising approaches for identifying key features that led to learned model predictions. While most existing attribution methods rely on a baseline input for performing feature perturbations, limited research has been conducted to address the baseline selection issues. Poor choices of baselines limit the ability of one-vs-one explanations for multi-class classifiers, which means the attribution methods were not able to explain why an input belongs to its original class but not the other specified target class. Achieving one-vs-one explanation is crucial when certain classes are more similar than others, e.g. two bird types among multiple animals, by focusing on key differentiating features rather than shared features across classes. In this paper, we present GANMEX, a novel approach applying Generative Adversarial Networks (GAN) by incorporating the to-be-explained classifier as part of the adversarial networks. Our approach effectively selects the baseline as the closest realistic sample belong to the target class, which allows attribution methods to provide true one-vs-one explanations. We showed that GANMEX baselines improved the saliency maps and led to stronger performance on perturbation-based evaluation metrics over the existing baselines. Existing attribution results are known for being insensitive to model randomization, and we demonstrated that GANMEX baselines led to better outcome under the cascading randomization of the model.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 7

page 10

page 12

research
04/13/2022

Baseline Computation for Attribution Methods Based on Interpolated Inputs

We discuss a way to find a well behaved baseline for attribution methods...
research
06/07/2022

Fooling Explanations in Text Classifiers

State-of-the-art text classification models are becoming increasingly re...
research
05/19/2022

Towards a Theory of Faithfulness: Faithful Explanations of Differentiable Classifiers over Continuous Data

There is broad agreement in the literature that explanation methods shou...
research
11/22/2022

Shortcomings of Top-Down Randomization-Based Sanity Checks for Evaluations of Deep Neural Network Explanations

While the evaluation of explanations is an important step towards trustw...
research
06/08/2020

A Baseline for Shapely Values in MLPs: from Missingness to Neutrality

Being able to explain a prediction as well as having a model that perfor...
research
12/19/2021

RELAX: Representation Learning Explainability

Despite the significant improvements that representation learning via se...
research
02/24/2023

Don't be fooled: label leakage in explanation methods and the importance of their quantitative evaluation

Feature attribution methods identify which features of an input most inf...

Please sign up or login with your details

Forgot password? Click here to reset