Thompson Sampling with a Mixture Prior

by   Joey Hong, et al.

We study Thompson sampling (TS) in online decision-making problems where the uncertain environment is sampled from a mixture distribution. This is relevant to multi-task settings, where a learning agent is faced with different classes of problems. We incorporate this structure in a natural way by initializing TS with a mixture prior – dubbed MixTS – and develop a novel, general technique for analyzing the regret of TS with such priors. We apply this technique to derive Bayes regret bounds for MixTS in both linear bandits and tabular Markov decision processes (MDPs). Our regret bounds reflect the structure of the problem and depend on the number of components and confidence width of each component of the prior. Finally, we demonstrate the empirical effectiveness of MixTS in both synthetic and real-world experiments.


page 1

page 2

page 3

page 4


Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs

In online learning problems, exploiting low variance plays an important ...

Regret Bounds for Information-Directed Reinforcement Learning

Information-directed sampling (IDS) has revealed its potential as a data...

Information Directed Sampling for Sparse Linear Bandits

Stochastic sparse linear bandits offer a practical model for high-dimens...

Online Learning in Kernelized Markov Decision Processes

We consider online learning for minimizing regret in unknown, episodic M...

On fixed and uncertain mixture prior weights

This paper focuses on the specification of the weights for the component...

Generalizing Hierarchical Bayesian Bandits

A contextual bandit is a popular and practical framework for online lear...

Autoregressive Bandits

Autoregressive processes naturally arise in a large variety of real-worl...

Please sign up or login with your details

Forgot password? Click here to reset