On the Convergence and Robustness of Batch Normalization

09/29/2018
by   Yongqiang Cai, et al.
0

Despite its empirical success, the theoretical underpinnings of the stability, convergence and acceleration properties of batch normalization (BN) remain elusive. In this paper, we attack this problem from a modeling approach, where we perform a thorough theoretical analysis on BN applied to a simplified model: ordinary least squares (OLS). We discover that gradient descent on OLS with BN has interesting properties, including a scaling law, convergence for arbitrary learning rates for the weights, asymptotic acceleration effects, as well as insensitivity to the choice of learning rates. We then demonstrate numerically that these findings are not specific to the OLS problem and hold qualitatively for more complex supervised learning problems. This points to a new direction towards uncovering the mathematical principles that underlies batch normalization.

READ FULL TEXT
research
12/10/2018

Theoretical Analysis of Auto Rate-Tuning by Batch Normalization

Batch Normalization (BN) has become a cornerstone of deep learning acros...
research
05/27/2018

Towards a Theoretical Understanding of Batch Normalization

Normalization techniques such as Batch Normalization have been applied v...
research
06/20/2023

The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks

We study the implicit bias of batch normalization trained by gradient de...
research
02/25/2016

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

We present weight normalization: a reparameterization of the weight vect...
research
01/08/2021

Towards Accelerating Training of Batch Normalization: A Manifold Perspective

Batch normalization (BN) has become a crucial component across diverse d...
research
12/11/2018

Controlling Covariate Shift using Equilibrium Normalization of Weights

We introduce a new normalization technique that exhibits the fast conver...
research
02/25/2020

Separating the Effects of Batch Normalization on CNN Training Speed and Stability Using Classical Adaptive Filter Theory

Batch Normalization (BatchNorm) is commonly used in Convolutional Neural...

Please sign up or login with your details

Forgot password? Click here to reset