Asymptotically Optimal Multi-Armed Bandit Policies under a Cost Constraint

09/09/2015
by   Apostolos N. Burnetas, et al.
0

We develop asymptotically optimal policies for the multi armed bandit (MAB), problem, under a cost constraint. This model is applicable in situations where each sample (or activation) from a population (bandit) incurs a known bandit dependent cost. Successive samples from each population are iid random variables with unknown distribution. The objective is to design a feasible policy for deciding from which population to sample from, so as to maximize the expected sum of outcomes of n total samples or equivalently to minimize the regret due to lack on information on sample distributions, For this problem we consider the class of feasible uniformly fast (f-UF) convergent policies, that satisfy the cost constraint sample-path wise. We first establish a necessary asymptotic lower bound for the rate of increase of the regret function of f-UF policies. Then we construct a class of f-UF policies and provide conditions under which they are asymptotically optimal within the class of f-UF policies, achieving this asymptotic lower bound. At the end we provide the explicit form of such policies for the case in which the unknown distributions are Normal with unknown means and known variances.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2018

Asymptotically Optimal Multi-Armed Bandit Activation Policies under Side Constraints

This paper introduces the first asymptotically optimal strategy for the ...
research
01/19/2012

Adaptive Policies for Sequential Sampling under Incomplete Information and a Cost Constraint

We consider the problem of sequential sampling from a finite number of i...
research
10/07/2015

Asymptotically Optimal Sequential Experimentation Under Generalized Ranking

We consider the classical problem of a controller activating (or samplin...
research
04/22/2015

Normal Bandits of Unknown Means and Variances: Asymptotic Optimality, Finite Horizon Regret Bounds, and a Solution to an Open Problem

Consider the problem of sampling sequentially from a finite number of N ...
research
05/08/2015

An Asymptotically Optimal Policy for Uniform Bandits of Unknown Support

Consider the problem of a controller sampling sequentially from a finite...
research
07/25/2020

Sequential Multi-hypothesis Testing in Multi-armed Bandit Problems:An Approach for Asymptotic Optimality

We consider a multi-hypothesis testing problem involving a K-armed bandi...
research
06/08/2022

Uplifting Bandits

We introduce a multi-armed bandit model where the reward is a sum of mul...

Please sign up or login with your details

Forgot password? Click here to reset