Differentiable Self-Adaptive Learning Rate

10/19/2022
by   Bozhou Chen, et al.
0

Learning rate adaptation is a popular topic in machine learning. Gradient Descent trains neural nerwork with a fixed learning rate. Learning rate adaptation is proposed to accelerate the training process through adjusting the step size in the training session. Famous works include Momentum, Adam and Hypergradient. Hypergradient is the most special one. Hypergradient achieved adaptation by calculating the derivative of learning rate with respect to cost function and utilizing gradient descent for learning rate. However, Hypergradient is still not perfect. In practice, Hypergradient fail to decrease training loss after learning rate adaptation with a large probability. Apart from that, evidence has been found that Hypergradient are not suitable for dealing with large datesets in the form of minibatch training. Most unfortunately, Hypergradient always fails to get a good accuracy on the validation dataset although it could reduce training loss to a very tiny value. To solve Hypergradient's problems, we propose a novel adaptation algorithm, where learning rate is parameter specific and internal structured. We conduct extensive experiments on multiple network models and datasets compared with various benchmark optimizers. It is shown that our algorithm can achieve faster and higher qualified convergence than those state-of-art optimizers.

READ FULL TEXT

page 7

page 9

page 10

page 11

page 12

research
01/27/2018

Gradient descent revisited via an adaptive online learning rate

Any gradient descent optimization requires to choose a learning rate. Wi...
research
05/02/2007

The Parameter-Less Self-Organizing Map algorithm

The Parameter-Less Self-Organizing Map (PLSOM) is a new neural network a...
research
11/22/2021

Towards a Principled Learning Rate Adaptation for Natural Evolution Strategies

Natural Evolution Strategies (NES) is a promising framework for black-bo...
research
10/15/2020

Neograd: gradient descent with an adaptive learning rate

Since its inception by Cauchy in 1847, the gradient descent algorithm ha...
research
04/07/2023

CMA-ES with Learning Rate Adaptation: Can CMA-ES with Default Population Size Solve Multimodal and Noisy Problems?

The covariance matrix adaptation evolution strategy (CMA-ES) is one of t...
research
11/18/2019

Feedback Control for Online Training of Neural Networks

Convolutional neural networks (CNNs) are commonly used for image classif...
research
06/08/2020

The Golden Ratio of Learning and Momentum

Gradient descent has been a central training principle for artificial ne...

Please sign up or login with your details

Forgot password? Click here to reset