The Multiplicative Noise in Stochastic Gradient Descent: Data-Dependent Regularization, Continuous and Discrete Approximation

06/18/2019
by   Jingfeng Wu, et al.
0

The randomness in Stochastic Gradient Descent (SGD) is considered to play a central role in the observed strong generalization capability of deep learning. In this work, we re-interpret the stochastic gradient of vanilla SGD as a matrix-vector product of the matrix of gradients and a random noise vector (namely multiplicative noise, M-Noise). Comparing to the existing theory that explains SGD using additive noise, the M-Noise helps establish a general case of SGD, namely Multiplicative SGD (M-SGD). The advantage of M-SGD is that it decouples noise from parameters, providing clear insights at the inherent randomness in SGD. Our analysis shows that 1) the M-SGD family, including the vanilla SGD, can be viewed as an minimizer with a data-dependent regularizer resemble of Rademacher complexity, which contributes to the implicit bias of M-SGD; 2) M-SGD holds a strong convergence to a continuous stochastic differential equation under the Gaussian noise assumption, ensuring the path-wise closeness of the discrete and continuous dynamics. For applications, based on M-SGD we design a fast algorithm to inject noise of different types (e.g., Gaussian and Bernoulli) into gradient descent. Based on the algorithm, we further demonstrate that M-SGD can approximate SGD with various noise types and recover the generalization performance, which reveals the potential of M-SGD to solve practical deep learning problems, e.g., large batch training with strong generalization performance. We have validated our observations on multiple practical deep learning scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/14/2021

Non Asymptotic Bounds for Optimization via Online Multiplicative Stochastic Gradient Descent

The gradient noise of Stochastic Gradient Descent (SGD) is considered to...
research
02/13/2021

Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections

Gaussian noise injections (GNIs) are a family of simple and widely-used ...
research
02/10/2021

On Minibatch Noise: Discrete-Time SGD, Overparametrization, and Bayes

The noise in stochastic gradient descent (SGD), caused by minibatch samp...
research
06/07/2023

Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks

In this work, we reveal a strong implicit bias of stochastic gradient de...
research
06/15/2020

Shape Matters: Understanding the Implicit Bias of the Noise Covariance

The noise in stochastic gradient descent (SGD) provides a crucial implic...
research
04/03/2019

A Stochastic Interpretation of Stochastic Mirror Descent: Risk-Sensitive Optimality

Stochastic mirror descent (SMD) is a fairly new family of algorithms tha...
research
10/03/2019

Partial differential equation regularization for supervised machine learning

This article is an overview of supervised machine learning problems for ...

Please sign up or login with your details

Forgot password? Click here to reset