Descending through a Crowded Valley – Benchmarking Deep Learning Optimizers

07/03/2020
by   Robin M. Schmidt, et al.
7

Choosing the optimizer is among the most crucial decisions of deep learning engineers, and it is not an easy one. The growing literature now lists literally hundreds of optimization methods. In the absence of clear theoretical guidance and conclusive empirical evidence, the decision is often done according to personal anecdotes. In this work, we aim to replace these anecdotes, if not with evidence, then at least with heuristics. To do so, we perform an extensive, standardized benchmark of more than a dozen particularly popular deep learning optimizers while giving a concise overview of the wide range of possible choices. Analyzing almost 35 000 individual runs, we contribute the following three points: Optimizer performance varies greatly across tasks. We observe that evaluating multiple optimizers with default parameters works approximately as well as tuning the hyperparameters of a single, fixed optimizer. While we can not identify an individual optimization method clearly dominating across all tested tasks, we identify a significantly reduced subset of specific algorithms and parameter choices that generally provided competitive results in our experiments. This subset includes popular favorites and some less well-known contenders. We have open-sourced all our experimental results, making it available to use as well-tuned baselines when evaluating novel optimization methods and therefore reducing the necessary computational efforts.

READ FULL TEXT

page 6

page 16

page 17

page 18

page 19

research
03/06/2023

Judging Adam: Studying the Performance of Optimization Methods on ML4SE Tasks

Solving a problem with a deep learning model requires researchers to opt...
research
10/25/2019

On the Tunability of Optimizers in Deep Learning

There is no consensus yet on the question whether adaptive gradient meth...
research
11/04/2020

EAdam Optimizer: How ε Impact Adam

Many adaptive optimization methods have been proposed and used in deep l...
research
03/13/2019

DeepOBS: A Deep Learning Optimizer Benchmark Suite

Because the choice and tuning of the optimizer affects the speed, and ul...
research
06/02/2023

Leveraging the Triple Exponential Moving Average for Fast-Adaptive Moment Estimation

Network optimization is a crucial step in the field of deep learning, as...
research
11/16/2020

Mixing ADAM and SGD: a Combined Optimization Method

Optimization methods (optimizers) get special attention for the efficien...
research
01/15/2021

When SIMPLE is better than complex: A case study on deep learning for predicting Bugzilla issue close time

Is deep learning over-hyped? Where are the case studies that compare sta...

Please sign up or login with your details

Forgot password? Click here to reset