Robust Semantic Interpretability: Revisiting Concept Activation Vectors

04/06/2021
by   Jacob Pfau, et al.
0

Interpretability methods for image classification assess model trustworthiness by attempting to expose whether the model is systematically biased or attending to the same cues as a human would. Saliency methods for feature attribution dominate the interpretability literature, but these methods do not address semantic concepts such as the textures, colors, or genders of objects within an image. Our proposed Robust Concept Activation Vectors (RCAV) quantifies the effects of semantic concepts on individual model predictions and on model behavior as a whole. RCAV calculates a concept gradient and takes a gradient ascent step to assess model sensitivity to the given concept. By generalizing previous work on concept activation vectors to account for model non-linearity, and by introducing stricter hypothesis testing, we show that RCAV yields interpretations which are both more accurate at the image level and robust at the dataset level. RCAV, like saliency methods, supports the interpretation of individual predictions. To evaluate the practical use of interpretability methods as debugging tools, and the scientific use of interpretability methods for identifying inductive biases (e.g. texture over shape), we construct two datasets and accompanying metrics for realistic benchmarking of semantic interpretability methods. Our benchmarks expose the importance of counterfactual augmentation and negative controls for quantifying the practical usability of interpretability methods.

READ FULL TEXT
research
01/17/2023

Opti-CAM: Optimizing saliency maps for interpretability

Methods based on class activation maps (CAM) provide a simple mechanism ...
research
06/27/2020

Improving Interpretability of CNN Models Using Non-Negative Concept Activation Vectors

Convolutional neural network (CNN) models for computer vision are powerf...
research
08/23/2023

Concept Bottleneck with Visual Concept Filtering for Explainable Medical Image Classification

Interpretability is a crucial factor in building reliable models for var...
research
12/22/2021

Comparing radiologists' gaze and saliency maps generated by interpretability methods for chest x-rays

The interpretability of medical image analysis models is considered a ke...
research
07/13/2023

Uncovering Unique Concept Vectors through Latent Space Decomposition

Interpreting the inner workings of deep learning models is crucial for e...
research
06/08/2021

On the Lack of Robust Interpretability of Neural Text Classifiers

With the ever-increasing complexity of neural language models, practitio...
research
07/19/2021

Path Integrals for the Attribution of Model Uncertainties

Enabling interpretations of model uncertainties is of key importance in ...

Please sign up or login with your details

Forgot password? Click here to reset