An Empirical Study of the Occurrence of Heavy-Tails in Training a ReLU Gate

04/26/2022
by   Sayar Karmakar, et al.
0

A particular direction of recent advance about stochastic deep-learning algorithms has been about uncovering a rather mysterious heavy-tailed nature of the stationary distribution of these algorithms, even when the data distribution is not so. Moreover, the heavy-tail index is known to show interesting dependence on the input dimension of the net, the mini-batch size and the step size of the algorithm. In this short note, we undertake an experimental study of this index for S.G.D. while training a gate (in the realizable and in the binary classification setup) and for a variant of S.G.D. that was proven in Karmakar and Mukherjee (2022) for ReLU realizable data. From our experiments we conjecture that these two algorithms have similar heavy-tail behaviour on any data where the latter can be proven to converge. Secondly, we demonstrate that the heavy-tail index of the late time iterates in this model scenario has strikingly different properties than either what has been proven for linear hypothesis classes or what has been previously demonstrated for large nets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2022

Heavy-Tailed Density Estimation

A novel statistical method is proposed and investigated for estimating a...
research
06/08/2020

The Heavy-Tail Phenomenon in SGD

In recent years, various notions of capacity and complexity have been pr...
research
05/13/2022

Heavy-Tail Phenomenon in Decentralized SGD

Recent theoretical studies have shown that heavy-tails can emerge in sto...
research
10/06/2020

Testing Tail Weight of a Distribution Via Hazard Rate

Understanding the shape of a distribution of data is of interest to peop...
research
05/28/2021

Fork-join and redundancy systems with heavy-tailed job sizes

We investigate the tail asymptotics of the response time distribution fo...
research
05/09/2014

Gaussian-Chain Filters for Heavy-Tailed Noise with Application to Detecting Big Buyers and Big Sellers in Stock Market

We propose a new heavy-tailed distribution --- Gaussian-Chain (GC) distr...
research
02/13/2020

Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise

Stochastic gradient descent with momentum (SGDm) is one of the most popu...

Please sign up or login with your details

Forgot password? Click here to reset