Feature Perturbation Augmentation for Reliable Evaluation of Importance Estimators

by   Lennart Brocki, et al.

Post-hoc explanation methods attempt to make the inner workings of deep neural networks more interpretable. However, since a ground truth is in general lacking, local post-hoc interpretability methods, which assign importance scores to input features, are challenging to evaluate. One of the most popular evaluation frameworks is to perturb features deemed important by an interpretability method and to measure the change in prediction accuracy. Intuitively, a large decrease in prediction accuracy would indicate that the explanation has correctly quantified the importance of features with respect to the prediction outcome (e.g., logits). However, the change in the prediction outcome may stem from perturbation artifacts, since perturbed samples in the test dataset are out of distribution (OOD) compared to the training dataset and can therefore potentially disturb the model in an unexpected manner. To overcome this challenge, we propose feature perturbation augmentation (FPA) which creates and adds perturbed images during the model training. Through extensive computational experiments, we demonstrate that FPA makes deep neural networks (DNNs) more robust against perturbations. Furthermore, training DNNs with FPA demonstrate that the sign of importance scores may explain the model more meaningfully than has previously been assumed. Overall, FPA is an intuitive data augmentation technique that improves the evaluation of post-hoc interpretability methods.


Evaluation of Interpretability Methods and Perturbation Artifacts in Deep Neural Networks

The challenge of interpreting predictions from deep neural networks has ...

EDDA: Explanation-driven Data Augmentation to Improve Model and Explanation Alignment

Recent years have seen the introduction of a range of methods for post-h...

Evaluating Feature Importance Estimates

Estimating the influence of a given feature to a model prediction is cha...

Multicriteria interpretability driven Deep Learning

Deep Learning methods are renowned for their performances, yet their lac...

Transformation Importance with Applications to Cosmology

Machine learning lies at the heart of new possibilities for scientific d...

Explainer Divergence Scores (EDS): Some Post-Hoc Explanations May be Effective for Detecting Unknown Spurious Correlations

Recent work has suggested post-hoc explainers might be ineffective for d...

Sampling Based On Natural Image Statistics Improves Local Surrogate Explainers

Many problems in computer vision have recently been tackled using models...

Please sign up or login with your details

Forgot password? Click here to reset