Zap Q-Learning With Nonlinear Function Approximation

10/11/2019
by   Shuhang Chen, et al.
0

The Zap stochastic approximation (SA) algorithm was introduced recently as a means to accelerate convergence in reinforcement learning algorithms. While numerical results were impressive, stability (in the sense of boundedness of parameter estimates) was established in only a few special cases. This class of algorithms is generalized in this paper, and stability is established under very general conditions. This general result can be applied to a wide range of algorithms found in reinforcement learning. Two classes are considered in this paper: (i)The natural generalization of Watkins' algorithm is not always stable in function approximation settings. Parameter estimates may diverge to infinity even in the linear function approximation setting with a simple finite state-action MDP. Under mild conditions, the Zap SA algorithm provides a stable algorithm, even in the case of nonlinear function approximation. (ii) The GQ algorithm of Maei et. al. 2010 is designed to address the stability challenge. Analysis is provided to explain why the algorithm may be very slow to converge in practice. The new Zap GQ algorithm is stable even for nonlinear function approximation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/05/2023

Stability of Q-Learning Through Design and Optimism

Q-learning has become an important part of the reinforcement learning to...
research
08/08/2020

Convex Q-Learning, Part 1: Deterministic Optimal Control

It is well known that the extension of Watkins' algorithm to general fun...
research
09/21/2020

Optimal Stable Nonlinear Approximation

While it is well known that nonlinear methods of approximation can often...
research
09/17/2018

Zap Meets Momentum: Stochastic Approximation Algorithms with Optimal Convergence Rate

There are two well known Stochastic Approximation techniques that are kn...
research
03/02/2018

Specialized Interior Point Algorithm for Stable Nonlinear System Identification

Estimation of nonlinear dynamic models from data poses many challenges, ...
research
04/28/2021

A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning

Many reinforcement learning algorithms rely on value estimation. However...
research
04/27/2021

Stability of trigonometric approximation in L^p and applications to prediction theory

Let Γ be an LCA group and (μ_n) be a sequence of bounded regular Borel m...

Please sign up or login with your details

Forgot password? Click here to reset