Debiased-CAM for bias-agnostic faithful visual explanations of deep convolutional networks

by   Wencan Zhang, et al.

Class activation maps (CAMs) explain convolutional neural network predictions by identifying salient pixels, but they become misaligned and misleading when explaining predictions on images under bias, such as images blurred accidentally or deliberately for privacy protection, or images with improper white balance. Despite model fine-tuning to improve prediction performance on these biased images, we demonstrate that CAM explanations become more deviated and unfaithful with increased image bias. We present Debiased-CAM to recover explanation faithfulness across various bias types and levels by training a multi-input, multi-task model with auxiliary tasks for CAM and bias level predictions. With CAM as a prediction task, explanations are made tunable by retraining the main model layers and made faithful by self-supervised learning from CAMs of unbiased images. The model provides representative, bias-agnostic CAM explanations about the predictions on biased images as if generated from their unbiased form. In four simulation studies with different biases and prediction tasks, Debiased-CAM improved both CAM faithfulness and task performance. We further conducted two controlled user studies to validate its truthfulness and helpfulness, respectively. Quantitative and qualitative analyses of participant responses confirmed Debiased-CAM as more truthful and helpful. Debiased-CAM thus provides a basis to generate more faithful and relevant explanations for a wide range of real-world applications with various sources of bias.


page 11

page 12

page 13

page 15

page 31

page 36


Debiased-CAM to mitigate systematic error with faithful visual explanations of machine learning

Model explanations such as saliency maps can improve user trust in AI by...

Global explanations for discovering bias in data

In the paper, we propose attention-based summarized post-hoc explanation...

Explaining Deep Convolutional Neural Networks for Image Classification by Evolving Local Interpretable Model-agnostic Explanations

Deep convolutional neural networks have proven their effectiveness, and ...

Local Interpretable Model-agnostic Explanations of Bayesian Predictive Models via Kullback-Leibler Projections

We introduce a method, KL-LIME, for explaining predictions of Bayesian p...

Feature Attributions and Counterfactual Explanations Can Be Manipulated

As machine learning models are increasingly used in critical decision-ma...

Measuring and improving the quality of visual explanations

The ability of to explain neural network decisions goes hand in hand wit...

Examining the Difference Among Transformers and CNNs with Explanation Methods

We propose a methodology that systematically applies deep explanation al...

Please sign up or login with your details

Forgot password? Click here to reset