Thompson Sampling with a Mixture Prior

06/10/2021
by   Joey Hong, et al.
0

We study Thompson sampling (TS) in online decision-making problems where the uncertain environment is sampled from a mixture distribution. This is relevant to multi-task settings, where a learning agent is faced with different classes of problems. We incorporate this structure in a natural way by initializing TS with a mixture prior – dubbed MixTS – and develop a novel, general technique for analyzing the regret of TS with such priors. We apply this technique to derive Bayes regret bounds for MixTS in both linear bandits and tabular Markov decision processes (MDPs). Our regret bounds reflect the structure of the problem and depend on the number of components and confidence width of each component of the prior. Finally, we demonstrate the empirical effectiveness of MixTS in both synthetic and real-world experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/05/2021

Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs

In online learning problems, exploiting low variance plays an important ...
06/09/2022

Regret Bounds for Information-Directed Reinforcement Learning

Information-directed sampling (IDS) has revealed its potential as a data...
05/29/2021

Information Directed Sampling for Sparse Linear Bandits

Stochastic sparse linear bandits offer a practical model for high-dimens...
05/21/2018

Online Learning in Kernelized Markov Decision Processes

We consider online learning for minimizing regret in unknown, episodic M...
06/27/2023

On fixed and uncertain mixture prior weights

This paper focuses on the specification of the weights for the component...
05/30/2022

Generalizing Hierarchical Bayesian Bandits

A contextual bandit is a popular and practical framework for online lear...
12/12/2022

Autoregressive Bandits

Autoregressive processes naturally arise in a large variety of real-worl...

Please sign up or login with your details

Forgot password? Click here to reset