The effect of Target Normalization and Momentum on Dying ReLU

05/13/2020
by   Isac Arnekvist, et al.
0

Optimizing parameters with momentum, normalizing data values, and using rectified linear units (ReLUs) are popular choices in neural network (NN) regression. Although ReLUs are popular, they can collapse to a constant function and "die", effectively removing their contribution from the model. While some mitigations are known, the underlying reasons of ReLUs dying during optimization are currently poorly understood. In this paper, we consider the effects of target normalization and momentum on dying ReLUs. We find empirically that unit variance targets are well motivated and that ReLUs die more easily, when target variance approaches zero. To further investigate this matter, we analyze a discrete-time linear autonomous system, and show theoretically how this relates to a model with a single ReLU and how common properties can result in dying ReLU. We also analyze the gradients of a single-ReLU model to identify saddle points and regions corresponding to dying ReLU and how parameters evolve into these regions when momentum is used. Finally, we show empirically that this problem persist, and is aggravated, for deeper models including residual networks.

READ FULL TEXT
research
05/09/2023

SkelEx and BoundEx: Natural Visualization of ReLU Neural Networks

Despite their limited interpretability, weights and biases are still the...
research
07/29/2022

Spline Representation and Redundancies of One-Dimensional ReLU Neural Network Models

We analyze the structure of a one-dimensional deep ReLU neural network (...
research
05/25/2023

Neural Characteristic Activation Value Analysis for Improved ReLU Network Feature Learning

We examine the characteristic activation values of individual ReLU units...
research
11/15/2021

ReLU Network Approximation in Terms of Intrinsic Parameters

This paper studies the approximation error of ReLU networks in terms of ...
research
03/22/2020

Dynamic ReLU

Rectified linear units (ReLU) are commonly used in deep neural networks....
research
02/15/2021

Scaling Up Exact Neural Network Compression by ReLU Stability

We can compress a neural network while exactly preserving its underlying...
research
10/06/2017

Accumulated Gradient Normalization

This work addresses the instability in asynchronous data parallel optimi...

Please sign up or login with your details

Forgot password? Click here to reset