Deterministic Value-Policy Gradients

09/09/2019
by   Qingpeng Cai, et al.
0

Reinforcement learning algorithms such as the deep deterministic policy gradient algorithm (DDPG) has been widely used in continuous control tasks. However, the model-free DDPG algorithm suffers from high sample complexity. In this paper we consider the deterministic value gradients to improve the sample efficiency of deep reinforcement learning algorithms. Previous works consider deterministic value gradients with the finite horizon, but it is too myopic compared with infinite horizon. We firstly give a theoretical guarantee of the existence of the value gradients in this infinite setting. Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias. Furthermore, to better combine the model-based deterministic value gradient estimators with the model-free deterministic policy gradient estimator, we propose the deterministic value-policy gradient (DVPG) algorithm. We finally conduct extensive experiments comparing DVPG with state-of-the-art methods on several standard continuous control benchmarks. Results demonstrate that DVPG substantially outperforms other baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/10/2018

Generalized deterministic policy gradient algorithms

We study a setting of reinforcement learning, where the state transition...
research
11/15/2019

Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient

Model-free reinforcement learning algorithms such as Deep Deterministic ...
research
07/01/2020

Regularly Updated Deterministic Policy Gradient Algorithm

Deep Deterministic Policy Gradient (DDPG) algorithm is one of the most w...
research
10/30/2015

Learning Continuous Control Policies by Stochastic Value Gradients

We present a unified framework for learning continuous control policies ...
research
06/12/2020

Zeroth-order Deterministic Policy Gradient

Deterministic Policy Gradient (DPG) removes a level of randomness from s...
research
06/06/2020

Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies

Offline reinforcement learning, wherein one uses off-policy data logged ...
research
02/04/2019

PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos

Previously, the exploding gradient problem has been explained to be cent...

Please sign up or login with your details

Forgot password? Click here to reset