On the Global Convergence of Momentum-based Policy Gradient

10/19/2021
∙
by   Yuhao Ding, et al.
∙
0
∙

Policy gradient (PG) methods are popular and efficient for large-scale reinforcement learning due to their relative stability and incremental nature. In recent years, the empirical success of PG methods has led to the development of a theoretical foundation for these methods. In this work, we generalize this line of research by studying the global convergence of stochastic PG methods with momentum terms, which have been demonstrated to be efficient recipes for improving PG methods. We study both the soft-max and the Fisher-non-degenerate policy parametrizations, and show that adding a momentum improves the global optimality sample complexity of vanilla PG methods by 𝒊Ėƒ(Ïĩ^-1.5) and 𝒊Ėƒ(Ïĩ^-1), respectively, where Ïĩ>0 is the target tolerance. Our work is the first one that obtains global convergence results for the momentum-based PG methods. For the generic Fisher-non-degenerate policy parametrizations, our result is the first single-loop and finite-batch PG algorithm achieving Õ(Ïĩ^-3) global optimality sample complexity. Finally, as a by-product, our methods also provide general framework for analyzing the global convergence rates of stochastic PG methods, which can be easily applied and extended to different PG estimators.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
∙ 02/03/2023

Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies

Recently, the impressive empirical success of policy gradient (PG) metho...
research
∙ 10/22/2020

Sample Efficient Reinforcement Learning with REINFORCE

Policy gradient methods are among the most effective methods for large-s...
research
∙ 12/06/2021

MDPGT: Momentum-based Decentralized Policy Gradient Tracking

We propose a novel policy gradient method for multi-agent reinforcement ...
research
∙ 03/09/2020

Stochastic Recursive Momentum for Policy Gradient Methods

In this paper, we propose a novel algorithm named STOchastic Recursive M...
research
∙ 11/15/2022

An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods

In this paper, we revisit and improve the convergence of policy gradient...
research
∙ 09/09/2023

Global Convergence of Receding-Horizon Policy Search in Learning Estimator Designs

We introduce the receding-horizon policy gradient (RHPG) algorithm, the ...
research
∙ 07/23/2021

A general sample complexity analysis of vanilla policy gradient

The policy gradient (PG) is one of the most popular methods for solving ...

Please sign up or login with your details

Forgot password? Click here to reset