A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay

03/26/2018
by   Leslie N. Smith, et al.
0

Although deep learning has produced dazzling successes for applications of image, speech, and video processing in the past few years, most trainings are with suboptimal hyper-parameters, requiring unnecessarily long training times. Setting the hyper-parameters remains a black art that requires years of experience to acquire. This report proposes several efficient ways to set the hyper-parameters that significantly reduce training time and improves performance. Specifically, this report shows how to examine the training validation/test loss function for subtle clues of underfitting and overfitting and suggests guidelines for moving toward the optimal balance point. Then it discusses how to increase/decrease the learning rate/momentum to speed up training. Our experiments show that it is crucial to balance every manner of regularization for each dataset and architecture. Weight decay is used as a sample regularizer to show how its optimal value is tightly coupled with the learning rates and momentums.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2017

Don't Decay the Learning Rate, Increase the Batch Size

It is common practice to decay the learning rate. Here we show one can u...
research
10/16/2019

An Exponential Learning Rate Schedule for Deep Learning

Intriguing empirical evidence exists that deep learning can work well wi...
research
09/30/2022

Adaptive Weight Decay: On The Fly Weight Decay Tuning for Improving Robustness

We introduce adaptive weight decay, which automatically tunes the hyper-...
research
08/03/2022

Empirical Study of Overfitting in Deep FNN Prediction Models for Breast Cancer Metastasis

Overfitting is defined as the fact that the current model fits a specifi...
research
04/02/2022

AdaSmooth: An Adaptive Learning Rate Method based on Effective Ratio

It is well known that we need to choose the hyper-parameters in Momentum...
research
06/15/2020

Spherical Motion Dynamics of Deep Neural Networks with Batch Normalization and Weight Decay

We comprehensively reveal the learning dynamics of deep neural networks ...
research
08/30/2021

Tune It or Don't Use It: Benchmarking Data-Efficient Image Classification

Data-efficient image classification using deep neural networks in settin...

Please sign up or login with your details

Forgot password? Click here to reset