A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning

03/01/2020
by   Nhan H. Pham, et al.
0

We propose a novel hybrid stochastic policy gradient estimator by combining an unbiased policy gradient estimator, the REINFORCE estimator, with another biased one, an adapted SARAH estimator for policy optimization. The hybrid policy gradient estimator is shown to be biased, but has variance reduced property. Using this estimator, we develop a new Proximal Hybrid Stochastic Policy Gradient Algorithm (ProxHSPGA) to solve a composite policy optimization problem that allows us to handle constraints or regularizers on the policy parameters. We first propose a single-looped algorithm then introduce a more practical restarting variant. We prove that both algorithms can achieve the best-known trajectory complexity O(ε^-3) to attain a first-order stationary point for the composite problem which is better than existing REINFORCE/GPOMDP O(ε^-4) and SVRPG O(ε^-10/3) in the non-composite setting. We evaluate the performance of our algorithm on several well-known examples in reinforcement learning. Numerical results show that our algorithm outperforms two existing methods on these examples. Moreover, the composite settings indeed have some advantages compared to the non-composite ones on certain problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2019

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Improving the sample efficiency in reinforcement learning has been a lon...
research
11/06/2021

Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods

In reinforcement learning, continuous time is often discretized by a tim...
research
08/24/2018

Proximal Policy Optimization and its Dynamic Version for Sequence Generation

In sequence generation task, many works use policy gradient for model op...
research
06/25/2019

Policy Optimization with Stochastic Mirror Descent

Stochastic mirror descent (SMD) keeps the advantages of simplicity of im...
research
02/06/2021

A Hybrid Approach for Reinforcement Learning Using Virtual Policy Gradient for Balancing an Inverted Pendulum

Using the policy gradient algorithm, we train a single-hidden-layer neur...
research
05/05/2023

Model-free Reinforcement Learning of Semantic Communication by Stochastic Policy Gradient

Motivated by the recent success of Machine Learning tools in wireless co...
research
01/26/2023

Partial advantage estimator for proximal policy optimization

Estimation of value in policy gradient methods is a fundamental problem....

Please sign up or login with your details

Forgot password? Click here to reset