Meta-Thompson Sampling

by   Branislav Kveton, et al.

Efficient exploration in multi-armed bandits is a fundamental online learning problem. In this work, we propose a variant of Thompson sampling that learns to explore better as it interacts with problem instances drawn from an unknown prior distribution. Our algorithm meta-learns the prior and thus we call it Meta-TS. We propose efficient implementations of Meta-TS and analyze it in Gaussian bandits. Our analysis shows the benefit of meta-learning the prior and is of a broader interest, because we derive the first prior-dependent upper bound on the Bayes regret of Thompson sampling. This result is complemented by empirical evaluation, which shows that Meta-TS quickly adapts to the unknown prior.


page 1

page 2

page 3

page 4


Meta-Learning for Simple Regret Minimization

We develop a meta-learning framework for simple regret minimization in b...

Online Meta-Learning in Adversarial Multi-Armed Bandits

We study meta-learning for adversarial multi-armed bandits. We consider ...

Nonstochastic Bandits with Infinitely Many Experts

We study the problem of nonstochastic bandits with infinitely many exper...

Bayesian decision-making under misspecified priors with applications to meta-learning

Thompson sampling and other Bayesian sequential decision-making algorith...

Neural Collaborative Filtering Bandits via Meta Learning

Contextual multi-armed bandits provide powerful tools to solve the explo...

AutoML for Contextual Bandits

Contextual Bandits is one of the widely popular techniques used in appli...

Meta Dynamic Pricing: Learning Across Experiments

We study the problem of learning across a sequence of price experiments ...

Please sign up or login with your details

Forgot password? Click here to reset