Information Directed Sampling and Bandits with Heteroscedastic Noise

by   Johannes Kirschner, et al.

In the stochastic bandit problem, the goal is to maximize an unknown function via a sequence of noisy function evaluations. Typically, the observation noise is assumed to be independent of the evaluation point and satisfies a tail bound taken uniformly on the domain. In this work, we consider the setting of heteroscedastic noise, that is, we explicitly allow the noise distribution to depend on the evaluation point. We show that this leads to new trade-offs for information and regret, which are not taken into account by existing approaches like upper confidence bound algorithms (UCB) or Thompson Sampling. To address these shortcomings, we introduce a frequentist regret framework, that is similar to the Bayesian analysis of Russo and Van Roy (2014). We prove a new high-probability regret bound for general, possibly randomized policies, depending on a quantity we call the regret-information ratio. From this bound, we define a frequentist version of Information Directed Sampling (IDS) to minimize a surrogate of the regret-information ratio over all possible action sampling distributions. In order to construct the surrogate function, we generalize known concentration inequalities for least squares regression in separable Hilbert spaces to the case of heteroscedastic noise. This allows us to formulate several variants of IDS for linear and reproducing kernel Hilbert space response functions, yielding a family of novel algorithms for Bayesian optimization. We also provide frequentist regret bounds, which in the homoscedastic case are comparable to existing bounds for UCB, but can be much better when the noise is heteroscedastic. Finally, we empirically demonstrate in a linear setting, that some of our methods can outperform UCB and Thompson Sampling, even when the noise is homoscedastic.


page 1

page 2

page 3

page 4


The Randomized Elliptical Potential Lemma with an Application to Linear Thompson Sampling

In this note, we introduce a randomized version of the well-known ellipt...

Streaming kernel regression with provably adaptive mean, variance, and regularization

We consider the problem of streaming kernel regression, when the observa...

Information Directed Sampling for Stochastic Bandits with Graph Feedback

We consider stochastic multi-armed bandit problems with graph feedback, ...

Mirror Descent and the Information Ratio

We establish a connection between the stability of mirror descent and th...

A Bit Better? Quantifying Information for Bandit Learning

The information ratio offers an approach to assessing the efficacy with ...

Adaptive Sampling for Estimating Distributions: A Bayesian Upper Confidence Bound Approach

The problem of adaptive sampling for estimating probability mass functio...

First-Order Regret Analysis of Thompson Sampling

We address online combinatorial optimization when the player has a prior...

Please sign up or login with your details

Forgot password? Click here to reset