Variance Reduction for Evolution Strategies via Structured Control Variates

05/29/2019
by   Yunhao Tang, et al.
1

Evolution Strategies (ES) are a powerful class of blackbox optimization techniques that recently became a competitive alternative to state-of-the-art policy gradient (PG) algorithms for reinforcement learning (RL). We propose a new method for improving accuracy of the ES algorithms, that as opposed to recent approaches utilizing only Monte Carlo structure of the gradient estimator, takes advantage of the underlying MDP structure to reduce the variance. We observe that the gradient estimator of the ES objective can be alternatively computed using reparametrization and PG estimators, which leads to new control variate techniques for gradient estimation in ES optimization. We provide theoretical insights and show through extensive experiments that this RL-specific variance reduction approach outperforms general purpose variance reduction methods.

READ FULL TEXT
research
08/08/2019

Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods

Policy gradient methods have demonstrated success in reinforcement learn...
research
05/29/2019

Structured Monte Carlo Sampling for Nonisotropic Distributions via Determinantal Point Processes

We propose a new class of structured methods for Monte Carlo (MC) sampli...
research
04/21/2023

Noise-Reuse in Online Evolution Strategies

Online evolution strategies have become an attractive alternative to aut...
research
07/24/2023

Policy Gradient Optimal Correlation Search for Variance Reduction in Monte Carlo simulation and Maximum Optimal Transport

We propose a new algorithm for variance reduction when estimating f(X_T)...
research
05/20/2019

Stochastic Variance Reduction for Deep Q-learning

Recent advances in deep reinforcement learning have achieved human-level...
research
02/21/2020

Accelerating Reinforcement Learning with a Directional-Gaussian-Smoothing Evolution Strategy

Evolution strategy (ES) has been shown great promise in many challenging...
research
03/16/2023

Enabling First-Order Gradient-Based Learning for Equilibrium Computation in Markets

Understanding and analyzing markets is crucial, yet analytical equilibri...

Please sign up or login with your details

Forgot password? Click here to reset