Divergence of the ADAM algorithm with fixed-stepsize: a (very) simple example

08/01/2023
by   Ph. L. Toint, et al.
0

A very simple unidimensional function with Lipschitz continuous gradient is constructed such that the ADAM algorithm with constant stepsize, started from the origin, diverges when applied to minimize this function in the absence of noise on the gradient. Divergence occurs irrespective of the choice of the method parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/17/2022

A Modified Nonlinear Conjugate Gradient Algorithm for Functions with Non-Lipschitz Gradient

In this paper, we propose a modified nonlinear conjugate gradient (NCG) ...
research
03/21/2019

Towards Characterizing Divergence in Deep Q-Learning

Deep Q-Learning (DQL), a family of temporal difference algorithms for co...
research
01/03/2018

Rooted Divergence-Preserving Branching Bisimilarity is a Congruence

We prove that rooted divergence-preserving branching bisimilarity is a c...
research
05/30/2019

Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm

We present a novel algorithm to estimate the barycenter of arbitrary pro...
research
10/16/2020

Variational (Gradient) Estimate of the Score Function in Energy-based Latent Variable Models

The learning and evaluation of energy-based latent variable models (EBLV...
research
03/07/2023

Continuous Function Structured in Multilayer Perceptron for Global Optimization

The gradient information of multilayer perceptron with a linear neuron i...
research
03/09/2021

Monotonic Alpha-divergence Minimisation

In this paper, we introduce a novel iterative algorithm which carries ou...

Please sign up or login with your details

Forgot password? Click here to reset