Learning to Accelerate by the Methods of Step-size Planning

04/01/2022
by   Hengshuai Yao, et al.
0

Gradient descent is slow to converge for ill-conditioned problems and non-convex problems. An important technique for acceleration is step-size adaptation. The first part of this paper contains a detailed review of step-size adaptation methods, including Polyak step-size, L4, LossGrad, Adam and IDBD. In the second part of this paper, we propose a new class of methods of accelerating gradient descent that are quite different from existing techniques. The new methods, which we call step-size planning, use the update experience to learn an improved way of updating the parameters. The methods organize the experience into K steps away from each other to facilitate planning. From the past experience, our planning algorithm, Csawg, learns a step-size model which is a form of multi-step machine that predicts future updates. We extends Csawg to applying step-size planning multiple steps, which leads to further speedup. We discuss and highlight the projection power of the diagonal-matrix step-size for future large scale applications. We show for a convex problem, our methods can surpass the convergence rate of Nesterov's accelerated gradient, 1 - √(μ/L), where μ, L are the strongly convex factor of the loss function F and the Lipschitz constant of F'. On the classical non-convex Rosenbrock function, our planning methods achieve zero error below 500 gradient evaluations, while gradient descent takes about 10000 gradient evaluations to reach a 10^-3 accuracy. We discuss the connection of step-size planing to planning in reinforcement learning, in particular, Dyna architectures. We leave convergence and convergence rate proofs and applications of the methods to high-dimensional problems for future work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2021

On the Convergence of Step Decay Step-Size for Stochastic Optimization

The convergence of stochastic gradient descent is highly dependent on th...
research
01/15/2020

Theoretical Interpretation of Learned Step Size in Deep-Unfolded Gradient Descent

Deep unfolding is a promising deep-learning technique in which an iterat...
research
08/05/2019

Extending the step-size restriction for gradient descent to avoid strict saddle points

We provide larger step-size restrictions for which gradient descent base...
research
02/23/2023

A subgradient method with constant step-size for ℓ_1-composite optimization

Subgradient methods are the natural extension to the non-smooth case of ...
research
01/31/2022

Step-size Adaptation Using Exponentiated Gradient Updates

Optimizers like Adam and AdaGrad have been very successful in training l...
research
06/15/2023

MinMax Networks

While much progress has been achieved over the last decades in neuro-ins...
research
04/07/2015

From Averaging to Acceleration, There is Only a Step-size

We show that accelerated gradient descent, averaged gradient descent and...

Please sign up or login with your details

Forgot password? Click here to reset