Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation

12/31/2019
by   Xinjie Fan, et al.
21

Sequence generation models are commonly refined with reinforcement learning over user-defined metrics. However, high gradient variance hinders the practical use of this method. To stabilize this method, we adapt to contextual generation of categorical sequences a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control. Due to the correlation, the number of unique rollouts is random and adaptive to model uncertainty; those rollouts naturally become baselines for each other, and hence are combined to effectively reduce gradient variance. We also demonstrate the use of correlated MC rollouts for binary-tree softmax models, which reduce the high generation cost in large vocabulary scenarios by decomposing each categorical action into a sequence of binary actions. We evaluate our methods on both neural program synthesis and image captioning. The proposed methods yield lower gradient variance and consistent improvement over related baselines.

READ FULL TEXT

page 7

page 14

page 19

page 20

research
08/08/2019

Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods

Policy gradient methods have demonstrated success in reinforcement learn...
research
06/28/2020

Deep Bayesian Quadrature Policy Optimization

We study the problem of obtaining accurate policy gradient estimates. Th...
research
01/31/2023

Improving Monte Carlo Evaluation with Offline Data

Monte Carlo (MC) methods are the most widely used methods to estimate th...
research
02/16/2022

Policy Learning and Evaluation with Randomized Quasi-Monte Carlo

Reinforcement learning constantly deals with hard integrals, for example...
research
07/24/2023

Policy Gradient Optimal Correlation Search for Variance Reduction in Monte Carlo simulation and Maximum Optimal Transport

We propose a new algorithm for variance reduction when estimating f(X_T)...
research
06/19/2023

Multilevel Surrogate-based Control Variates

Monte Carlo (MC) sampling is a popular method for estimating the statist...
research
05/04/2019

ARSM: Augment-REINFORCE-Swap-Merge Estimator for Gradient Backpropagation Through Categorical Variables

To address the challenge of backpropagating the gradient through categor...

Please sign up or login with your details

Forgot password? Click here to reset