Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits

by   Jongyeong Lee, et al.

In the stochastic multi-armed bandit problem, a randomized probability matching policy called Thompson sampling (TS) has shown excellent performance in various reward models. In addition to the empirical performance, TS has been shown to achieve asymptotic problem-dependent lower bounds in several models. However, its optimality has been mainly addressed under light-tailed or one-parameter models that belong to exponential families. In this paper, we consider the optimality of TS for the Pareto model that has a heavy tail and is parameterized by two unknown parameters. Specifically, we discuss the optimality of TS with probability matching priors that include the Jeffreys prior and the reference priors. We first prove that TS with certain probability matching priors can achieve the optimal regret bound. Then, we show the suboptimality of TS with other priors, including the Jeffreys and the reference priors. Nevertheless, we find that TS with the Jeffreys and reference priors can achieve the asymptotic lower bound if one uses a truncation procedure. These results suggest carefully choosing noninformative priors to avoid suboptimality and show the effectiveness of truncation procedures in TS-based policies.


Pareto Regret Analyses in Multi-objective Multi-armed Bandit

We study Pareto optimality in multi-objective multi-armed bandit by prov...

Asymptotically Optimal Thompson Sampling Based Policy for the Uniform Bandits and the Gaussian Bandits

Thompson sampling (TS) for the parametric stochastic multi-armed bandits...

A Simple and Optimal Policy Design with Safety against Heavy-tailed Risk for Multi-armed Bandits

We design new policies that ensure both worst-case optimality for expect...

Learning Approximately Objective Priors

Informative Bayesian priors are often difficult to elicit, and when this...

Thompson Sampling for 1-Dimensional Exponential Family Bandits

Thompson Sampling has been demonstrated in many complex bandit models, h...

Existence of matching priors on compact spaces

A matching prior at level 1-α is a prior such that an associated 1-α cre...

Indexability is Not Enough for Whittle: Improved, Near-Optimal Algorithms for Restless Bandits

We study the problem of planning restless multi-armed bandits (RMABs) wi...

Please sign up or login with your details

Forgot password? Click here to reset