Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence

05/30/2019
by   Aditya Golatkar, et al.
0

Regularization is typically understood as improving generalization by altering the landscape of local extrema to which the model eventually converges. Deep neural networks (DNNs), however, challenge this view: We show that removing regularization after an initial transient period has little effect on generalization, even if the final loss landscape is the same as if there had been no regularization. In some cases, generalization even improves after interrupting regularization. Conversely, if regularization is applied only after the initial transient, it has no effect on the final solution, whose generalization gap is as bad as if regularization never happened. This suggests that what matters for training deep networks is not just whether or how, but when to regularize. The phenomena we observe are manifest in different datasets (CIFAR-10, CIFAR-100), different architectures (ResNet-18, All-CNN), different regularization methods (weight decay, data augmentation), different learning rate schedules (exponential, piece-wise constant). They collectively suggest that there is a "critical period" for regularizing deep networks that is decisive of the final performance. More analysis should, therefore, focus on the transient rather than asymptotic behavior of learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2019

Further advantages of data augmentation on convolutional neural networks

Data augmentation is a popular technique largely used to enhance the tra...
research
10/28/2020

Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel

In suitably initialized wide networks, small learning rates transform de...
research
12/02/2019

The intriguing role of module criticality in the generalization of deep networks

We study the phenomenon that some modules of deep neural networks (DNNs)...
research
10/06/2022

Critical Learning Periods for Multisensory Integration in Deep Networks

We show that the ability of a neural network to integrate information fr...
research
03/11/2021

Intraclass clustering: an implicit learning ability that regularizes DNNs

Several works have shown that the regularization mechanisms underlying d...
research
06/15/2020

On the training dynamics of deep networks with L_2 regularization

We study the role of L_2 regularization in deep learning, and uncover si...
research
07/12/2023

Non-Ideal Program-Time Conservation in Charge Trap Flash for Deep Learning

Training deep neural networks (DNNs) is computationally intensive but ar...

Please sign up or login with your details

Forgot password? Click here to reset