Thompson sampling for improved exploration in GFlowNets

06/30/2023
by   Jarrid Rector-Brooks, et al.
0

Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over compositional objects as a sequential decision-making problem with a learnable action policy. Unlike other algorithms for hierarchical sampling that optimize a variational bound, GFlowNet algorithms can stably run off-policy, which can be advantageous for discovering modes of the target distribution. Despite this flexibility in the choice of behaviour policy, the optimal way of efficiently selecting trajectories for training has not yet been systematically explored. In this paper, we view the choice of trajectories for training as an active learning problem and approach it using Bayesian techniques inspired by methods for multi-armed bandits. The proposed algorithm, Thompson sampling GFlowNets (TS-GFN), maintains an approximate posterior distribution over policies and samples trajectories from this posterior for training. We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2013

Building Bridges: Viewing Active Learning from the Multi-Armed Bandit Lens

In this paper we propose a multi-armed bandit inspired, pool based activ...
research
10/29/2021

Variational Bayesian Optimistic Sampling

We consider online sequential decision problems where an agent must bala...
research
07/19/2023

VITS : Variational Inference Thomson Sampling for contextual bandits

In this paper, we introduce and analyze a variant of the Thompson sampli...
research
10/30/2019

Thompson Sampling via Local Uncertainty

Thompson sampling is an efficient algorithm for sequential decision maki...
research
08/14/2019

Thompson Sampling and Approximate Inference

We study the effects of approximate inference on the performance of Thom...
research
12/27/2017

Active Search for High Recall: a Non-Stationary Extension of Thompson Sampling

We consider the problem of Active Search, where a maximum of relevant ob...
research
09/08/2021

Conservative Policy Construction Using Variational Autoencoders for Logged Data with Missing Values

In high-stakes applications of data-driven decision making like healthca...

Please sign up or login with your details

Forgot password? Click here to reset