Learning Rate Perturbation: A Generic Plugin of Learning Rate Schedule towards Flatter Local Minima

by   Hengyu Liu, et al.

Learning rate is one of the most important hyper-parameters that has a significant influence on neural network training. Learning rate schedules are widely used in real practice to adjust the learning rate according to pre-defined schedules for fast convergence and good generalization. However, existing learning rate schedules are all heuristic algorithms and lack theoretical support. Therefore, people usually choose the learning rate schedules through multiple ad-hoc trials, and the obtained learning rate schedules are sub-optimal. To boost the performance of the obtained sub-optimal learning rate schedule, we propose a generic learning rate schedule plugin, called LEArning Rate Perturbation (LEAP), which can be applied to various learning rate schedules to improve the model training by introducing a certain perturbation to the learning rate. We found that, with such a simple yet effective strategy, training processing exponentially favors flat minima rather than sharp minima with guaranteed convergence, which leads to better generalization ability. In addition, we conduct extensive experiments which show that training with LEAP can improve the performance of various deep learning models on diverse datasets using various learning rate schedules (including constant learning rate).


page 1

page 2

page 3

page 4


Learning an Adaptive Learning Rate Schedule

The learning rate is one of the most important hyper-parameters for mode...

Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning

We study two time-scale linear stochastic approximation algorithms, whic...

AutoDrop: Training Deep Learning Models with Automatic Learning Rate Drop

Modern deep learning (DL) architectures are trained using variants of th...

Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule

While the generalization properties of neural networks are not yet well ...

No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models

Recent research has shown the existence of significant redundancy in lar...

The Two Regimes of Deep Network Training

Learning rate schedule has a major impact on the performance of deep lea...

On the convergence of the MLE as an estimator of the learning rate in the Exp3 algorithm

When fitting the learning data of an individual to algorithm-like learni...

Please sign up or login with your details

Forgot password? Click here to reset