What You See May Not Be What You Get: UCB Bandit Algorithms Robust to ε-Contamination

10/12/2019
by   Laura Niss, et al.
24

Motivated by applications of bandit algorithms in education, we consider a stochastic multi-armed bandit problem with ε-contaminated rewards. We allow an adversary to arbitrarily give unbounded contaminated rewards with full knowledge of the past and future. We impose only the constraint that at any time t the proportion of contaminated rewards for any action is less than or equal to ε. We derive concentration inequalities for two robust mean estimators for sub-Gaussian distributions in the ε-contamination context. We define the ε-contaminated stochastic bandit problem and use our robust mean estimators to give two variants of a robust Upper Confidence Bound (UCB) algorithm, crUCB. Using regret derived from only the underlying stochastic rewards, both variants of crUCB achieve O (√(KTlog T)) regret when ε is small enough. Our simulations are designed to reflect reasonable settings a teacher would experience when implementing a bandit algorithm and thus use a limited horizon. We show that in certain adversarial regimes crUCB not only outperforms algorithms designed for stochastic (UCB1) and adversarial bandits (EXP3) but also those that have "best of both worlds" guarantees (EXP3++ and TsallisInf) even when our constraint on ε is broken.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2019

OSOM: A Simultaneously Optimal Algorithm for Multi-Armed and Linear Contextual Bandits

We consider the stochastic linear (multi-armed) contextual bandit proble...
research
08/16/2017

Corrupt Bandits for Preserving Local Privacy

We study a variant of the stochastic multi-armed bandit (MAB) problem in...
research
09/08/2012

Bandits with heavy tail

The stochastic multi-armed bandit problem is well understood when the re...
research
02/19/2020

Action-Manipulation Attacks Against Stochastic Bandits: Attacks and Defense

Due to the broad range of applications of stochastic multi-armed bandit ...
research
03/04/2020

Bandits with adversarial scaling

We study "adversarial scaling", a multi-armed bandit model where rewards...
research
05/29/2023

Robust Lipschitz Bandits to Adversarial Corruptions

Lipschitz bandit is a variant of stochastic bandits that deals with a co...
research
11/02/2020

Stochastic Linear Bandits with Protected Subspace

We study a variant of the stochastic linear bandit problem wherein we op...

Please sign up or login with your details

Forgot password? Click here to reset