The multi-armed bandit problem with covariates

10/27/2011
by   Vianney Perchet, et al.
0

We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multi-armed bandit problem, this setting allows for dynamically changing rewards that better describe applications where side information is available. We adopt a nonparametric model where the expected rewards are smooth functions of the covariate and where the hardness of the problem is captured by a margin parameter. To maximize the expected cumulative reward, we introduce a policy called Adaptively Binned Successive Elimination (abse) that adaptively decomposes the global problem into suitably "localized" static bandit problems. This policy constructs an adaptive partition using a variant of the Successive Elimination (se) policy. Our results include sharper regret bounds for the se policy in a static bandit problem and minimax optimal regret bounds for the abse policy in the dynamic problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/03/2019

Batched Multi-armed Bandits Problem

In this paper, we study the multi-armed bandit problem in the batched se...
research
01/29/2020

Functional Sequential Treatment Allocation with Covariates

We consider a multi-armed bandit problem with covariates. Given a realiz...
research
10/16/2017

On the Hardness of Inventory Management with Censored Demand Data

We consider a repeated newsvendor problem where the inventory manager ha...
research
12/04/2020

One-bit feedback is sufficient for upper confidence bound policies

We consider a variant of the traditional multi-armed bandit problem in w...
research
01/24/2019

The Assistive Multi-Armed Bandit

Learning preferences implicit in the choices humans make is a well studi...
research
06/07/2022

The Survival Bandit Problem

We study the survival bandit problem, a variant of the multi-armed bandi...
research
04/30/2023

ICQ: A Quantization Scheme for Best-Arm Identification Over Bit-Constrained Channels

We study the problem of best-arm identification in a distributed variant...

Please sign up or login with your details

Forgot password? Click here to reset