On the Linear convergence of Natural Policy Gradient Algorithm

05/04/2021
by   Sajad Khodadadian, et al.
0

Markov Decision Processes are classically solved using Value Iteration and Policy Iteration algorithms. Recent interest in Reinforcement Learning has motivated the study of methods inspired by optimization, such as gradient ascent. Among these, a popular algorithm is the Natural Policy Gradient, which is a mirror descent variant for MDPs. This algorithm forms the basis of several popular Reinforcement Learning algorithms such as Natural actor-critic, TRPO, PPO, etc, and so is being studied with growing interest. It has been shown that Natural Policy Gradient with constant step size converges with a sublinear rate of O(1/k) to the global optimal. In this paper, we present improved finite time convergence bounds, and show that this algorithm has geometric (also known as linear) asymptotic convergence rate. We further improve this convergence result by introducing a variant of Natural Policy Gradient with adaptive step sizes. Finally, we compare different variants of policy gradient methods experimentally.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2021

Theoretical Guarantees of Fictitious Discount Algorithms for Episodic Reinforcement Learning and Global Convergence of Policy Gradient Methods

When designing algorithms for finite-time-horizon episodic reinforcement...
research
06/14/2018

Stochastic Variance-Reduced Policy Gradient

In this paper, we propose a novel reinforcement- learning algorithm cons...
research
02/12/2020

Provably Convergent Policy Gradient Methods for Model-Agnostic Meta-Reinforcement Learning

We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcemen...
research
02/22/2023

Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes

The classical algorithms used in tabular reinforcement learning (Value I...
research
01/22/2022

Bag of Tricks for Natural Policy Gradient Reinforcement Learning

Natural policy gradient methods are popular reinforcement learning metho...
research
05/15/2022

Policy Gradient Method For Robust Reinforcement Learning

This paper develops the first policy gradient method with global optimal...
research
10/28/2021

Bayesian Sequential Optimal Experimental Design for Nonlinear Models Using Policy Gradient Reinforcement Learning

We present a mathematical framework and computational methods to optimal...

Please sign up or login with your details

Forgot password? Click here to reset