Policy Learning and Evaluation with Randomized Quasi-Monte Carlo

02/16/2022
by   Sébastien M. R. Arnold, et al.
0

Reinforcement learning constantly deals with hard integrals, for example when computing expectations in policy evaluation and policy iteration. These integrals are rarely analytically solvable and typically estimated with the Monte Carlo method, which induces high variance in policy values and gradients. In this work, we propose to replace Monte Carlo samples with low-discrepancy point sets. We combine policy gradient methods with Randomized Quasi-Monte Carlo, yielding variance-reduced formulations of policy gradient and actor-critic algorithms. These formulations are effective for policy evaluation and policy improvement, as they outperform state-of-the-art algorithms on standardized continuous control benchmarks. Our empirical analyses validate the intuition that replacing Monte Carlo with Quasi-Monte Carlo yields significantly more accurate gradient estimates.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2020

Deep Bayesian Quadrature Policy Optimization

We study the problem of obtaining accurate policy gradient estimates. Th...
research
10/19/2012

Monte Carlo Matrix Inversion Policy Evaluation

In 1950, Forsythe and Leibler (1950) introduced a statistical technique ...
research
10/18/2019

On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation

Reinforcement learning, mathematically described by Markov Decision Prob...
research
02/15/2021

Quasi-Monte Carlo Software

Practitioners wishing to experience the efficiency gains from using low ...
research
12/31/2019

Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation

Sequence generation models are commonly refined with reinforcement learn...
research
09/11/2021

Quasi-Monte Carlo-Based Conditional Malliavin Method for Continuous-Time Asian Option Greeks

Although many methods for computing the Greeks of discrete-time Asian op...
research
02/14/2021

Costly Features Classification using Monte Carlo Tree Search

We consider the problem of costly feature classification, where we seque...

Please sign up or login with your details

Forgot password? Click here to reset