Parameter-based Value Functions

by   Francesco Faccio, et al.

Learning value functions off-policy is at the core of modern Reinforcement Learning (RL). Traditional off-policy actor-critic algorithms, however, only approximate the true policy gradient, since the gradient ∇_θ Q^π_θ(s,a) of the action-value function with respect to the policy parameters is often ignored. We introduce a class of value functions called Parameter-based Value Functions (PVFs) whose inputs include the policy parameters. PVFs can evaluate the performance of any policy given a state, a state-action pair, or a distribution over the RL agent's initial states. We show how PVFs yield exact policy gradient theorems. We derive off-policy actor-critic algorithms based on PVFs trained using Monte Carlo or Temporal Difference methods. Preliminary experimental results indicate that PVFs can effectively evaluate deterministic linear and nonlinear policies, outperforming state-of-the-art algorithms in the continuous control environment Swimmer-v3. Finally, we show how recurrent neural networks can be trained through PVFs to solve supervised and RL problems involving partial observability and long time lags between relevant events. This provides an alternative to backpropagation through time.


page 1

page 2

page 3

page 4


Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation

We present the first class of policy-gradient algorithms that work with ...

General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States

Learning to evaluate and improve policies is a core problem of Reinforce...

Comparison of Reinforcement Learning algorithms applied to the Cart Pole problem

Designing optimal controllers continues to be challenging as systems are...

Learning Dynamics and Generalization in Reinforcement Learning

Solving a reinforcement learning (RL) problem poses two competing challe...

Learn Quasi-stationary Distributions of Finite State Markov Chain

We propose a reinforcement learning (RL) approach to compute the express...

Meta-Gradient Reinforcement Learning with an Objective Discovered Online

Deep reinforcement learning includes a broad family of algorithms that p...

Please sign up or login with your details

Forgot password? Click here to reset