Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel

10/28/2020
by   Stanislav Fort, et al.
0

In suitably initialized wide networks, small learning rates transform deep neural networks (DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well-approximated by a linear weight expansion of the network at initialization. Standard training, however, diverges from its linearization in ways that are poorly understood. We study the relationship between the training dynamics of nonlinear deep networks, the geometry of the loss landscape, and the time evolution of a data-dependent NTK. We do so through a large-scale phenomenological analysis of training, synthesizing diverse measures characterizing loss landscape geometry and NTK dynamics. In multiple neural architectures and datasets, we find these diverse measures evolve in a highly correlated manner, revealing a universal picture of the deep learning process. In this picture, deep network training exhibits a highly chaotic rapid initial transient that within 2 to 3 epochs determines the final linearly connected basin of low loss containing the end point of training. During this chaotic transient, the NTK changes rapidly, learning useful features from the training data that enables it to outperform the standard initial NTK by a factor of 3 in less than 3 to 4 epochs. After this rapid chaotic transient, the NTK changes at constant velocity, and its performance matches that of full network training in 15 our analysis reveals a striking correlation between a diverse set of metrics over training time, governed by a rapid chaotic to stable transition in the first few epochs, that together poses challenges and opportunities for the development of more accurate theories of deep learning.

READ FULL TEXT

page 7

page 16

research
09/22/2020

Anomalous diffusion dynamics of learning in deep neural networks

Learning in deep neural networks (DNNs) is implemented through minimizin...
research
06/14/2021

Extracting Global Dynamics of Loss Landscape in Deep Learning Models

Deep learning models evolve through training to learn the manifold in wh...
research
10/06/2021

Characterizing Learning Dynamics of Deep Neural Networks via Complex Networks

In this paper, we interpret Deep Neural Networks with Complex Network Th...
research
09/25/2018

The jamming transition as a paradigm to understand the loss landscape of deep neural networks

Deep learning has been immensely successful at a variety of tasks, rangi...
research
10/06/2022

Critical Learning Periods for Multisensory Integration in Deep Networks

We show that the ability of a neural network to integrate information fr...
research
04/22/2014

Attractor Metadynamics in Adapting Neural Networks

Slow adaption processes, like synaptic and intrinsic plasticity, abound ...

Please sign up or login with your details

Forgot password? Click here to reset