KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints

05/14/2018
by   Aurélien Garivier, et al.
0

In the context of K-armed stochastic bandits with distribution only assumed to be supported by [0, 1], we introduce a new algorithm, KL-UCB-switch, and prove that it enjoys simultaneously a distribution-free regret bound of optimal order √(KT) and a distribution-dependent regret bound of optimal order as well, that is, matching the κ T lower bound by Lai and Robbins (1985) and Burnetas and Katehakis (1996).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2020

Adaptation to the Range in K-Armed Bandits

We consider stochastic bandit problems with K arms, each associated with...
research
03/23/2022

Minimax Regret for Cascading Bandits

Cascading bandits model the task of learning to rank K out of L items ov...
research
07/09/2018

Dynamic Pricing with Finitely Many Unknown Valuations

Motivated by posted price auctions where buyers are grouped in an unknow...
research
02/23/2017

A minimax and asymptotically optimal algorithm for stochastic bandits

We propose the kl-UCB ++ algorithm for regret minimization in stochastic...
research
10/05/2020

Diversity-Preserving K-Armed Bandits, Revisited

We consider the bandit-based framework for diversity-preserving recommen...
research
02/25/2019

Improved Algorithm on Online Clustering of Bandits

We generalize the setting of online clustering of bandits by allowing no...
research
05/30/2019

Distribution-dependent and Time-uniform Bounds for Piecewise i.i.d Bandits

We consider the setup of stochastic multi-armed bandits in the case when...

Please sign up or login with your details

Forgot password? Click here to reset