The Fragility of Optimized Bandit Algorithms

by   Lin Fan, et al.

Much of the literature on optimal design of bandit algorithms is based on minimization of expected regret. It is well known that designs that are optimal over certain exponential families can achieve expected regret that grows logarithmically in the number of arm plays, at a rate governed by the Lai-Robbins lower bound. In this paper, we show that when one uses such optimized designs, the associated algorithms necessarily have the undesirable feature that the tail of the regret distribution behaves like that of a truncated Cauchy distribution. Furthermore, for p>1, the p'th moment of the regret distribution grows much faster than poly-logarithmically, in particular as a power of the number of sub-optimal arm plays. We show that optimized Thompson sampling and UCB bandit designs are also fragile, in the sense that when the problem is even slightly mis-specified, the regret can grow much faster than the conventional theory suggests. Our arguments are based on standard change-of-measure ideas, and indicate that the most likely way that regret becomes larger than expected is when the optimal arm returns below-average rewards in the first few arm plays that make that arm appear to be sub-optimal, thereby causing the algorithm to sample a truly sub-optimal arm much more than would be optimal.


page 1

page 2

page 3

page 4


The Typical Behavior of Bandit Algorithms

We establish strong laws of large numbers and central limit theorems for...

Optimal Simple Regret in Bayesian Best Arm Identification

We consider Bayesian best arm identification in the multi-armed bandit p...

A Novel Confidence-Based Algorithm for Structured Bandits

We study finite-armed stochastic bandits where the rewards of each arm m...

Online Model Selection: a Rested Bandit Formulation

Motivated by a natural problem in online model selection with bandit inf...

Bandit problems with fidelity rewards

The fidelity bandits problem is a variant of the K-armed bandit problem ...

Hellinger KL-UCB based Bandit Algorithms for Markovian and i.i.d. Settings

In the regret-based formulation of multi-armed bandit (MAB) problems, ex...

Fast Rate Learning in Stochastic First Price Bidding

First-price auctions have largely replaced traditional bidding approache...

Please sign up or login with your details

Forgot password? Click here to reset