Interpreting Neural Networks Using Flip Points

03/21/2019
by   Roozbeh Yousefzadeh, et al.
0

Neural networks have been criticized for their lack of easy interpretation, which undermines confidence in their use for important applications. Here, we introduce a novel technique, interpreting a trained neural network by investigating its flip points. A flip point is any point that lies on the boundary between two output classes: e.g. for a neural network with a binary yes/no output, a flip point is any input that generates equal scores for "yes" and "no". The flip point closest to a given input is of particular importance, and this point is the solution to a well-posed optimization problem. This paper gives an overview of the uses of flip points and how they are computed. Through results on standard datasets, we demonstrate how flip points can be used to provide detailed interpretation of the output produced by a neural network. Moreover, for a given input, flip points enable us to measure confidence in the correctness of outputs much more effectively than softmax score. They also identify influential features of the inputs, identify bias, and find changes in the input that change the output of the model. We show that distance between an input and the closest flip point identifies the most influential points in the training data. Using principal component analysis (PCA) and rank-revealing QR factorization (RR-QR), the set of directions from each training input to its closest flip point provides explanations of how a trained neural network processes an entire dataset: what features are most important for classification into a given class, which features are most responsible for particular misclassifications, how an adversary might fool the network, etc. Although we investigate flip points for neural networks, their usefulness is actually model-agnostic.

READ FULL TEXT

page 5

page 7

research
01/03/2020

Auditing and Debugging Deep Learning Models via Decision Boundaries: Individual-level and Group-level Analysis

Deep learning models have been criticized for their lack of easy interpr...
research
12/04/2019

Physically Interpretable Neural Networks for the Geosciences: Applications to Earth System Variability

Neural networks have become increasingly prevalent within the geoscience...
research
09/08/2018

Interpreting Neural Networks With Nearest Neighbors

Local model interpretation methods explain individual predictions by ass...
research
12/03/2018

Sensitivity based Neural Networks Explanations

Although neural networks can achieve very high predictive performance on...
research
06/24/2017

Methods for Interpreting and Understanding Deep Neural Networks

This paper provides an entry point to the problem of interpreting a deep...
research
06/15/2020

Detecting unusual input to neural networks

Evaluating a neural network on an input that differs markedly from the t...
research
12/01/2018

Rank Projection Trees for Multilevel Neural Network Interpretation

A variety of methods have been proposed for interpreting nodes in deep n...

Please sign up or login with your details

Forgot password? Click here to reset