Optimal UCB Adjustments for Large Arm Sizes

09/05/2019
by   Hock Peng Chan, et al.
0

The regret lower bound of Lai and Robbins (1985), the gold standard for checking optimality of bandit algorithms, considers arm size fixed as sample size goes to infinity. We show that when arm size increases polynomially with sample size, a surprisingly smaller lower bound is achievable. This is because the larger experimentation costs when there are more arms permit regret savings by exploiting the best performer more often. In particular we are able to construct a UCB-Large algorithm that adaptively exploits more when there are more arms. It achieves the smaller lower bound and is thus optimal. Numerical experiments show that UCB-Large performs better than classical UCB that does not correct for arm size, and better than Thompson sampling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2021

Continuous Time Bandits With Sampling Costs

We consider a continuous-time multi-arm bandit problem (CTMAB), where th...
research
05/30/2018

Infinite Arms Bandit: Optimality via Confidence Bounds

The infinite arms bandit problem was initiated by Berry et al. (1997). T...
research
03/23/2021

Bandits with many optimal arms

We consider a stochastic bandit problem with a possibly infinite number ...
research
05/23/2020

A Novel Confidence-Based Algorithm for Structured Bandits

We study finite-armed stochastic bandits where the rewards of each arm m...
research
11/18/2021

Optimal Simple Regret in Bayesian Best Arm Identification

We consider Bayesian best arm identification in the multi-armed bandit p...
research
11/01/2017

Minimal Exploration in Structured Stochastic Bandits

This paper introduces and addresses a wide class of stochastic bandit pr...
research
03/01/2023

A limiting model for a low Reynolds number swimmer with N passive elastic arms

We consider a low Reynolds number artificial swimmer that consists of an...

Please sign up or login with your details

Forgot password? Click here to reset