An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods

11/15/2022
by   Yanli Liu, et al.
0

In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG (NPG) methods, and their variance-reduced variants, under general smooth policy parametrizations. More specifically, with the Fisher information matrix of the policy being positive definite: i) we show that a state-of-the-art variance-reduced PG method, which has only been shown to converge to stationary points, converges to the globally optimal value up to some inherent function approximation error due to policy parametrization; ii) we show that NPG enjoys a lower sample complexity; iii) we propose SRVR-NPG, which incorporates variance-reduction into the NPG update. Our improvements follow from an observation that the convergence of (variance-reduced) PG and NPG methods can improve each other: the stationary convergence analysis of PG can be applied to NPG as well, and the global convergence analysis of NPG can help to establish the global convergence of (variance-reduced) PG methods. Our analysis carefully integrates the advantages of these two lines of works. Thanks to this improvement, we have also made variance-reduction for NPG possible, with both global convergence and an efficient finite-sample complexity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2022

PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation

Despite their success, policy gradient methods suffer from high variance...
research
02/17/2021

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Policy gradient gives rise to a rich class of reinforcement learning (RL...
research
10/26/2020

Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis

Variance reduction techniques have been successfully applied to temporal...
research
01/30/2023

SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search

Despite the popularity of policy gradient methods, they are known to suf...
research
05/29/2019

An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient

We revisit the stochastic variance-reduced policy gradient (SVRPG) metho...
research
10/19/2021

On the Global Convergence of Momentum-based Policy Gradient

Policy gradient (PG) methods are popular and efficient for large-scale r...
research
09/19/2023

Oracle Complexity Reduction for Model-free LQR: A Stochastic Variance-Reduced Policy Gradient Approach

We investigate the problem of learning an ϵ-approximate solution for the...

Please sign up or login with your details

Forgot password? Click here to reset