Batched Multi-armed Bandits Problem

04/03/2019
by   Zijun Gao, et al.
0

In this paper, we study the multi-armed bandit problem in the batched setting where the employed policy must split data into a small number of batches. While the minimax regret for the two-armed stochastic bandits has been completely characterized in perchet2016batched, the effect of the number of arms on the regret for the multi-armed case is still open. Moreover, the question whether adaptively chosen batch sizes will help to reduce the regret also remains underexplored. In this paper, we propose the BaSE (batched successive elimination) policy to achieve the rate-optimal regret (within logarithmic factors) for batched multi-armed bandits, with matching lower bounds even if the batch sizes are determined in a data-driven manner.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2019

Batched Multi-Armed Bandits with Optimal Regret

We present a simple and efficient algorithm for the batched stochastic m...
research
08/15/2021

Batched Thompson Sampling for Multi-Armed Bandits

We study Thompson Sampling algorithms for stochastic multi-armed bandits...
research
10/27/2011

The multi-armed bandit problem with covariates

We consider a multi-armed bandit problem in a setting where each arm pro...
research
03/19/2018

What Doubling Tricks Can and Can't Do for Multi-Armed Bandits

An online reinforcement learning algorithm is anytime if it does not nee...
research
11/22/2022

Transfer Learning for Contextual Multi-armed Bandits

Motivated by a range of applications, we study in this paper the problem...
research
02/01/2020

Advances in Bandits with Knapsacks

"Bandits with Knapsacks" () is a general model for multi-armed bandits u...
research
10/08/2017

Using the Value of Information to Explore Stochastic, Discrete Multi-Armed Bandits

In this paper, we propose an information-theoretic exploration strategy ...

Please sign up or login with your details

Forgot password? Click here to reset