Normal Bandits of Unknown Means and Variances: Asymptotic Optimality, Finite Horizon Regret Bounds, and a Solution to an Open Problem

04/22/2015
by   Wesley Cowan, et al.
0

Consider the problem of sampling sequentially from a finite number of N ≥ 2 populations, specified by random variables X^i_k, i = 1,... , N, and k = 1, 2, ...; where X^i_k denotes the outcome from population i the k^th time it is sampled. It is assumed that for each fixed i, { X^i_k }_k ≥ 1 is a sequence of i.i.d. normal random variables, with unknown mean μ_i and unknown variance σ_i^2. The objective is to have a policy π for deciding from which of the N populations to sample form at any time n=1,2,... so as to maximize the expected sum of outcomes of n samples or equivalently to minimize the regret due to lack on information of the parameters μ_i and σ_i^2. In this paper, we present a simple inflated sample mean (ISM) index policy that is asymptotically optimal in the sense of Theorem 4 below. This resolves a standing open problem from Burnetas and Katehakis (1996). Additionally, finite horizon regret bounds are given.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/08/2015

An Asymptotically Optimal Policy for Uniform Bandits of Unknown Support

Consider the problem of a controller sampling sequentially from a finite...
research
09/09/2015

Asymptotically Optimal Multi-Armed Bandit Policies under a Cost Constraint

We develop asymptotically optimal policies for the multi armed bandit (M...
research
10/07/2015

Asymptotically Optimal Sequential Experimentation Under Generalized Ranking

We consider the classical problem of a controller activating (or samplin...
research
01/19/2012

Adaptive Policies for Sequential Sampling under Incomplete Information and a Cost Constraint

We consider the problem of sequential sampling from a finite number of i...
research
02/28/2023

Asymptotically Optimal Thompson Sampling Based Policy for the Uniform Bandits and the Gaussian Bandits

Thompson sampling (TS) for the parametric stochastic multi-armed bandits...
research
10/23/2022

Tight relative estimation in the mean of Bernoulli random variables

Given a stream of Bernoulli random variables, consider the problem of es...
research
05/03/2018

Nonparametric Learning and Optimization with Covariates

Modern decision analytics frequently involves the optimization of an obj...

Please sign up or login with your details

Forgot password? Click here to reset