Explaining Neural Networks by Decoding Layer Activations

05/27/2020
by   Johannes Schneider, et al.
103

To derive explanations for deep learning models, ie. classifiers, we propose a `CLAssifier-DECoder' architecture (ClaDec). ClaDec allows to explain the output of an arbitrary layer. To this end, it uses a decoder that transforms the non-interpretable representation of the given layer to a representation that is more similar to training data. One can recognize what information a layer maintains by contrasting reconstructed images of ClaDec with those of a conventional auto-encoder(AE) serving as reference. Our extended version also allows to trade human interpretability and fidelity to customize explanations to individual needs. We evaluate our approach for image classification using CNNs. In alignment with our theoretical motivation, the qualitative evaluation highlights that reconstructed images (of the network to be explained) tend to replace specific objects with more generic object templates and provide smoother reconstructions. We also show quantitatively that reconstructed visualizations using encodings from a classifier do capture more relevant information for classification than conventional AEs despite the fact that the latter contain more information on the original input.

READ FULL TEXT

page 6

page 7

page 9

research
03/07/2022

Explaining Classifiers by Constructing Familiar Concepts

Interpreting a large number of neurons in deep learning is difficult. Ou...
research
10/30/2018

Generating new pictures in complex datasets with a simple neural network

We introduce a version of a variational auto-encoder (VAE), which can ge...
research
03/22/2023

Semi-supervised counterfactual explanations

Counterfactual explanations for machine learning models are used to find...
research
07/15/2019

A study on the Interpretability of Neural Retrieval Models using DeepSHAP

A recent trend in IR has been the usage of neural networks to learn retr...
research
10/26/2017

InterpNET: Neural Introspection for Interpretable Deep Learning

Humans are able to explain their reasoning. On the contrary, deep neural...
research
05/20/2022

B-cos Networks: Alignment is All We Need for Interpretability

We present a new direction for increasing the interpretability of deep n...
research
03/15/2021

Understanding invariance via feedforward inversion of discriminatively trained classifiers

A discriminatively trained neural net classifier achieves optimal perfor...

Please sign up or login with your details

Forgot password? Click here to reset