REX: Revisiting Budgeted Training with an Improved Schedule

07/09/2021
by   John Chen, et al.
0

Deep learning practitioners often operate on a computational and monetary budget. Thus, it is critical to design optimization algorithms that perform well under any budget. The linear learning rate schedule is considered the best budget-aware schedule, as it outperforms most other schedules in the low budget regime. On the other hand, learning rate schedules – such as the step schedule – are known to achieve high performance when the model can be trained for many epochs. Yet, it is often not known a priori whether one's budget will be large or small; thus, the optimal choice of learning rate schedule is made on a case-by-case basis. In this paper, we frame the learning rate schedule selection problem as a combination of i) selecting a profile (i.e., the continuous function that models the learning rate schedule), and ii) choosing a sampling rate (i.e., how frequently the learning rate is updated/sampled from this profile). We propose a novel profile and sampling rate combination called the Reflected Exponential (REX) schedule, which we evaluate across seven different experimental settings with both SGD and Adam optimizers. REX outperforms the linear schedule in the low budget regime, while matching or exceeding the performance of several state-of-the-art learning rate schedules (linear, step, exponential, cosine, step decay on plateau, and OneCycle) in both high and low budget regimes. Furthermore, REX requires no added computation, storage, or hyperparameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2020

k-decay: A New Method For Learning Rate Schedule

It is well known that the learning rate is the most important hyper-para...
research
02/24/2020

The Two Regimes of Deep Network Training

Learning rate schedule has a major impact on the performance of deep lea...
research
05/12/2019

Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints

In most practical settings and theoretical analysis, one assumes that a ...
research
05/28/2021

AutoSampling: Search for Effective Data Sampling Schedules

Data sampling acts as a pivotal role in training deep learning models. H...
research
10/09/2019

On the adequacy of untuned warmup for adaptive optimization

Adaptive optimization algorithms such as Adam (Kingma Ba, 2014) are ...
research
03/01/2021

Acceleration via Fractal Learning Rate Schedules

When balancing the practical tradeoffs of iterative methods for large-sc...
research
03/09/2020

Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule

While the generalization properties of neural networks are not yet well ...

Please sign up or login with your details

Forgot password? Click here to reset