The Spectrum of Fisher Information of Deep Networks Achieving Dynamical Isometry

06/14/2020
by   Tomohiro Hayase, et al.
0

The Fisher information matrix (FIM) is fundamental for understanding the trainability of deep neural networks (DNN) since it describes the local metric of the parameter space. We investigate the spectral distribution of the FIM given a single input by focusing on fully-connected networks achieving dynamical isometry. Then, while dynamical isometry is known to keep specific backpropagated signals independent of the depth, we find that the parameter space's local metric depends on the depth. In particular, we obtain an exact expression of the spectrum of the FIM given a single input and reveal that it concentrates around the depth point. Here, considering random initialization and the wide limit, we construct an algebraic methodology to examine the spectrum based on free probability theory, which is the algebraic wrapper of random matrix theory. As a byproduct, we provide the solvable spectral distribution in the two-hidden-layer case. Lastly, we empirically confirm that the spectrum of FIM with small batch-size has the same property as the single-input version. An experimental result shows that FIM's dependence on the depth determines the appropriate size of the learning rate for convergence at the initial phase of the online training of DNNs.

READ FULL TEXT
research
05/27/2019

Lightlike Neuromanifolds, Occam's Razor and Deep Learning

Why do deep neural networks generalize with a very high dimensional para...
research
03/24/2021

Asymptotic Freeness of Layerwise Jacobians Caused by Invariance of Multilayer Perceptron: The Haar Orthogonal Case

Free Probability Theory (FPT) provides rich knowledge for handling mathe...
research
10/14/2019

Pathological spectra of the Fisher information metric and its variants in deep neural networks

The Fisher information matrix (FIM) plays an essential role in statistic...
research
12/27/2018

Identifiability of parametric random matrix models

We investigate parameter identifiability of spectral distributions of ra...
research
02/27/2018

The Emergence of Spectral Universality in Deep Networks

Recent work has shown that tight concentration of the entire spectrum of...
research
07/31/2018

Spectrum concentration in deep residual learning: a free probability appproach

We revisit the initialization of deep residual networks (ResNets) by int...
research
02/03/2021

Microscopic Patterns in the 2D Phase-Field-Crystal Model

Using the recently developed theory of rigorously validated numerics, we...

Please sign up or login with your details

Forgot password? Click here to reset