Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

by   Pattarawat Chormai, et al.

Explainable AI transforms opaque decision strategies of ML models into explanations that are interpretable by the user, for example, identifying the contribution of each input feature to the prediction at hand. Such explanations, however, entangle the potentially multiple factors that enter into the overall complex decision strategy. We propose to disentangle explanations by finding relevant subspaces in activation space that can be mapped to more abstract human-understandable concepts and enable a joint attribution on concepts and input features. To automatically extract the desired representation, we propose new subspace analysis formulations that extend the principle of PCA and subspace analysis to explanations. These novel analyses, which we call principal relevant component analysis (PRCA) and disentangled relevant subspace analysis (DRSA), optimize relevance of projected activations rather than the more traditional variance or kurtosis. This enables a much stronger focus on subspaces that are truly relevant for the prediction and the explanation, in particular, ignoring activations or concepts to which the prediction model is invariant. Our approach is general enough to work alongside common attribution techniques such as Shapley Value, Integrated Gradients, or LRP. Our proposed methods show to be practically useful and compare favorably to the state of the art as demonstrated on benchmarks and three use cases.


page 2

page 10

page 13

page 28

page 32

page 34

page 35

page 36


From "Where" to "What": Towards Human-Understandable Explanations through Concept Relevance Propagation

The emerging field of eXplainable Artificial Intelligence (XAI) aims to ...

Sparse Subspace Clustering for Concept Discovery (SSCCD)

Concepts are key building blocks of higher level human understanding. Ex...

Revealing Hidden Context Bias in Segmentation and Object Detection through Concept-specific Explanations

Applying traditional post-hoc attribution methods to segmentation or obj...

Interpretable Directed Diversity: Leveraging Model Explanations for Iterative Crowd Ideation

Feedback can help crowdworkers to improve their ideations. However, curr...

Rationalization through Concepts

Automated predictions require explanations to be interpretable by humans...

Improving Explainability of Disentangled Representations using Multipath-Attribution Mappings

Explainable AI aims to render model behavior understandable by humans, w...

Visual Explanations with Attributions and Counterfactuals on Time Series Classification

With the rising necessity of explainable artificial intelligence (XAI), ...

Please sign up or login with your details

Forgot password? Click here to reset