Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies

by   Rui Yuan, et al.

We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class. Using the compatible function approximation framework, both methods with log-linear policies can be written as approximate versions of the policy mirror descent (PMD) method. We show that both methods attain linear convergence rates and π’ͺ(1/Ο΅^2) sample complexities using a simple, non-adaptive geometrically increasing step size, without resorting to entropy or other strongly convex regularization. Lastly, as a byproduct, we obtain sublinear convergence rates for both methods with arbitrary constant step size.


page 1

page 2

page 3

page 4

βˆ™ 11/03/2022

Geometry and convergence of natural policy gradient methods

We study the convergence of several natural policy gradient (NPG) method...
βˆ™ 01/19/2022

On the Convergence Rates of Policy Gradient Methods

We consider infinite-horizon discounted Markov decision problems with fi...
βˆ™ 09/30/2022

Linear Convergence for Natural Policy Gradient with Log-linear Policy Parametrization

We analyze the convergence rate of the unregularized natural policy grad...
βˆ™ 07/21/2020

A Note on the Linear Convergence of Policy Gradient Methods

We revisit the finite time analysis of policy gradient methods in the si...
βˆ™ 10/30/2021

Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings

Policy gradient methods have been frequently applied to problems in cont...
βˆ™ 07/29/2023

A new Gradient TD Algorithm with only One Step-size: Convergence Rate Analysis using L-Ξ» Smoothness

Gradient Temporal Difference (GTD) algorithms (Sutton et al., 2008, 2009...
βˆ™ 01/19/2022

Critic Algorithms using Cooperative Networks

An algorithm is proposed for policy evaluation in Markov Decision Proces...

Please sign up or login with your details

Forgot password? Click here to reset