Sample-efficient Policy Optimization with Stein Control Variate

10/30/2017
by   Hao Liu, et al.
0

Policy gradient methods have achieved remarkable successes in solving challenging reinforcement learning problems. However, it still often suffers from the large variance issue on policy gradient estimation, which leads to poor sample efficiency during training. In this work, we propose a control variate method to effectively reduce variance for policy gradient methods. Motivated by the Stein's identity, our method extends the previous control variate methods used in REINFORCE and advantage actor-critic by introducing more general action-dependent baseline functions. Empirical studies show that our method significantly improves the sample efficiency of the state-of-the-art policy gradient approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2017

Mean Actor Critic

We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action...
research
10/17/2017

Stochastic Variance Reduction for Policy Gradient Estimation

Recent advances in policy gradient methods and deep learning have demons...
research
01/03/2017

A K-fold Method for Baseline Estimation in Policy Gradient Algorithms

The high variance issue in unbiased policy-gradient methods such as VPG ...
research
03/13/2018

Learning to Explore with Meta-Policy Gradient

The performance of off-policy learning, including deep Q-learning and de...
research
10/29/2020

Low-Variance Policy Gradient Estimation with World Models

In this paper, we propose World Model Policy Gradient (WMPG), an approac...
research
05/20/2022

Sigmoidally Preconditioned Off-policy Learning:a new exploration method for reinforcement learning

One of the major difficulties of reinforcement learning is learning from...
research
02/01/2023

Distillation Policy Optimization

On-policy algorithms are supposed to be stable, however, sample-intensiv...

Please sign up or login with your details

Forgot password? Click here to reset