Regularly Updated Deterministic Policy Gradient Algorithm

07/01/2020
by   Shuai Han, et al.
0

Deep Deterministic Policy Gradient (DDPG) algorithm is one of the most well-known reinforcement learning methods. However, this method is inefficient and unstable in practical applications. On the other hand, the bias and variance of the Q estimation in the target function are sometimes difficult to control. This paper proposes a Regularly Updated Deterministic (RUD) policy gradient algorithm for these problems. This paper theoretically proves that the learning procedure with RUD can make better use of new data in replay buffer than the traditional procedure. In addition, the low variance of the Q value in RUD is more suitable for the current Clipped Double Q-learning strategy. This paper has designed a comparison experiment against previous methods, an ablation experiment with the original DDPG, and other analytical experiments in Mujoco environments. The experimental results demonstrate the effectiveness and superiority of RUD.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2019

Deterministic Value-Policy Gradients

Reinforcement learning algorithms such as the deep deterministic policy ...
research
11/12/2021

AWD3: Dynamic Reduction of the Estimation Bias

Value-based deep Reinforcement Learning (RL) algorithms suffer from the ...
research
11/02/2021

Off-Policy Correction for Deep Deterministic Policy Gradient Algorithms via Batch Prioritized Experience Replay

The experience replay mechanism allows agents to use the experiences mul...
research
01/24/2019

Sample Complexity of Estimating the Policy Gradient for Nearly Deterministic Dynamical Systems

Reinforcement learning is a promising approach to learning robot control...
research
06/18/2020

Reducing Estimation Bias via Weighted Delayed Deep Deterministic Policy Gradient

The overestimation phenomenon caused by function approximation is a well...
research
07/10/2018

Generalized deterministic policy gradient algorithms

We study a setting of reinforcement learning, where the state transition...
research
01/26/2023

Partial advantage estimator for proximal policy optimization

Estimation of value in policy gradient methods is a fundamental problem....

Please sign up or login with your details

Forgot password? Click here to reset