FixNorm: Dissecting Weight Decay for Training Deep Neural Networks

03/29/2021
by   Yucong Zhou, et al.
0

Weight decay is a widely used technique for training Deep Neural Networks(DNN). It greatly affects generalization performance but the underlying mechanisms are not fully understood. Recent works show that for layers followed by normalizations, weight decay mainly affects the effective learning rate. However, despite normalizations have been extensively adopted in modern DNNs, layers such as the final fully-connected layer do not satisfy this precondition. For these layers, the effects of weight decay are still unclear. In this paper, we comprehensively investigate the mechanisms of weight decay and find that except for influencing effective learning rate, weight decay has another distinct mechanism that is equally important: affecting generalization performance by controlling cross-boundary risk. These two mechanisms together give a more comprehensive explanation for the effects of weight decay. Based on this discovery, we propose a new training method called FixNorm, which discards weight decay and directly controls the two mechanisms. We also propose a simple yet effective method to tune hyperparameters of FixNorm, which can find near-optimal solutions in a few trials. On ImageNet classification task, training EfficientNet-B0 with FixNorm achieves 77.7 original baseline by a clear margin. Surprisingly, when scaling MobileNetV2 to the same FLOPS and applying the same tricks with EfficientNet-B0, training with FixNorm achieves 77.4 the importance of well-tuned training procedures, and further verify the effectiveness of our approach. We set up more well-tuned baselines using FixNorm, to facilitate fair comparisons in the community.

READ FULL TEXT
research
02/20/2019

A novel adaptive learning rate scheduler for deep neural networks

Optimizing deep neural networks is largely thought to be an empirical pr...
research
03/23/2021

How to decay your learning rate

Complex learning rate schedules have become an integral part of deep lea...
research
10/29/2018

Three Mechanisms of Weight Decay Regularization

Weight decay is one of the standard tricks in the neural network toolbox...
research
05/26/2023

Rotational Optimizers: Simple Robust DNN Training

The training dynamics of modern deep neural networks depend on complex i...
research
03/11/2021

Intraclass clustering: an implicit learning ability that regularizes DNNs

Several works have shown that the regularization mechanisms underlying d...
research
06/05/2018

On layer-level control of DNN training and its impact on generalization

The generalization ability of a neural network depends on the optimizati...
research
10/01/2020

Bag of Tricks for Adversarial Training

Adversarial training (AT) is one of the most effective strategies for pr...

Please sign up or login with your details

Forgot password? Click here to reset