CRAFT: Concept Recursive Activation FacTorization for Explainability

by   Thomas Fel, et al.

Attribution methods are a popular class of explainability methods that use heatmaps to depict the most important areas of an image that drive a model decision. Nevertheless, recent work has shown that these methods have limited utility in practice, presumably because they only highlight the most salient parts of an image (i.e., 'where' the model looked) and do not communicate any information about 'what' the model saw at those locations. In this work, we try to fill in this gap with CRAFT – a novel approach to identify both 'what' and 'where' by generating concept-based explanations. We introduce 3 new ingredients to the automatic concept extraction literature: (i) a recursive strategy to detect and decompose concepts across layers, (ii) a novel method for a more faithful estimation of concept importance using Sobol indices, and (iii) the use of implicit differentiation to unlock Concept Attribution Maps. We conduct both human and computer vision experiments to demonstrate the benefits of the proposed approach. We show that our recursive decomposition generates meaningful and accurate concepts and that the proposed concept importance estimation technique is more faithful to the model than previous methods. When evaluating the usefulness of the method for human experimenters on a human-defined utility benchmark, we find that our approach significantly improves on two of the three test scenarios (while none of the current methods including ours help on the third). Overall, our study suggests that, while much work remains toward the development of general explainability methods that are useful in practical scenarios, the identification of meaningful concepts at the proper level of granularity yields useful and complementary information beyond that afforded by attribution methods.


page 1

page 3

page 15

page 16

page 17

page 18

page 23

page 25


What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation Framework for Explainability Methods

A multitude of explainability methods and theoretical evaluation scores ...

A Holistic Approach to Unifying Automatic Concept Extraction and Concept Importance Estimation

In recent years, concept-based approaches have emerged as some of the mo...

COCKATIEL: COntinuous Concept ranKed ATtribution with Interpretable ELements for explaining neural net classifiers on NLP tasks

Transformer architectures are complex and their use in NLP, while it has...

Expanding Explainability Horizons: A Unified Concept-Based System for Local, Global, and Misclassification Explanations

Explainability of intelligent models has been garnering increasing atten...

Change Detection for Local Explainability in Evolving Data Streams

As complex machine learning models are increasingly used in sensitive ap...

Improving Interpretability of CNN Models Using Non-Negative Concept Activation Vectors

Convolutional neural network (CNN) models for computer vision are powerf...

Sparse Subspace Clustering for Concept Discovery (SSCCD)

Concepts are key building blocks of higher level human understanding. Ex...

Please sign up or login with your details

Forgot password? Click here to reset