Gradient Alignment in Deep Neural Networks

06/16/2020
by   Suraj Srinivas, et al.
17

One cornerstone of interpretable deep learning is the high degree of visual alignment that input-gradients, i.e.,the gradients of the output w.r.t. inputs, exhibit with the input data. This alignment is assumed to arise as a result of the model's generalization, justifying its use for interpretability. However, recent work has shown that it is possible to 'fool' models into having arbitrary gradients while achieving good generalization, thus falsifying the assumption above. This leaves an open question: if not generalization, what causes input-gradients to align with input data? In this work, we first show that it is simple to 'fool' input-gradients using the shift-invariance property of softmax, and that gradient structure is unrelated to model generalization. Second, we re-interpret the logits of standard classifiers as unnormalized log-densities of the data distribution, and find that we can improve this gradient alignment via a generative modelling objective called score-matching.To show this, we derive a novel approximation to the score-matching objective that eliminates the need for expensive Hessian computations, which may be of independent interest.Our experiments help us identify one factor that causes input-gradient alignment in models, that being the approximate generative modelling behaviour of the normalized logit distributions.

READ FULL TEXT

page 3

page 7

page 8

research
08/22/2023

Understanding Hessian Alignment for Domain Generalization

Out-of-distribution (OOD) generalization is a critical ability for deep ...
research
03/31/2021

Convolutional Dynamic Alignment Networks for Interpretable Classifications

We introduce a new family of neural network models called Convolutional ...
research
12/19/2019

Quantifying the effect of representations on task complexity

We examine the influence of input data representations on learning compl...
research
09/27/2021

Optimising for Interpretability: Convolutional Dynamic Alignment Networks

We introduce a new family of neural network models called Convolutional ...
research
08/03/2020

Making Coherence Out of Nothing At All: Measuring the Evolution of Gradient Alignment

We propose a new metric (m-coherence) to experimentally study the alignm...
research
05/30/2023

Which Models have Perceptually-Aligned Gradients? An Explanation via Off-Manifold Robustness

One of the remarkable properties of robust computer vision models is tha...
research
05/20/2022

B-cos Networks: Alignment is All We Need for Interpretability

We present a new direction for increasing the interpretability of deep n...

Please sign up or login with your details

Forgot password? Click here to reset