Convergence of the ADAM algorithm from a Dynamical System Viewpoint

10/04/2018
by   Anas Barakat, et al.
0

Adam is a popular variant of the stochastic gradient descent for finding a local minimizer of a function. The objective function is unknown but a random estimate of the current gradient vector is observed at each round of the algorithm. This paper investigates the dynamical behavior of Adam when the objective function is non-convex and differentiable. We introduce a continuous-time version of Adam, under the form of a non-autonomous ordinary differential equation (ODE). The existence and the uniqueness of the solution are established, as well as the convergence of the solution towards the stationary points of the objective function. It is also proved that the continuous-time system is a relevant approximation of the Adam iterates, in the sense that the interpolated Adam process converges weakly to the solution to the ODE.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2023

Ordinary Differential Equation-based Sparse Signal Recovery

This study investigates the use of continuous-time dynamical systems for...
research
11/17/2016

Stochastic Gradient Descent in Continuous Time

Stochastic gradient descent in continuous time (SGDCT) provides a comput...
research
12/07/2020

Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance

In this paper, a general stochastic optimization procedure is studied, u...
research
07/30/2022

A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization

In this paper, we present a stochastic gradient algorithm for minimizing...
research
12/16/2019

A Control-Theoretic Perspective on Optimal High-Order Optimization

In this paper, we provide a control-theoretic perspective on optimal ten...
research
07/22/2020

Examples of pathological dynamics of the subgradient method for Lipschitz path-differentiable functions

We show that the vanishing stepsize subgradient method – widely adopted ...
research
06/10/2021

A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized Gossip

We introduce the continuized Nesterov acceleration, a close variant of N...

Please sign up or login with your details

Forgot password? Click here to reset