Near Optimal Adversarial Attack on UCB Bandits

08/21/2020
by   Shiliang Zuo, et al.
0

We consider a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. We propose a novel attack strategy that manipulates a UCB principle into pulling some non-optimal target arm T - o(T) times with a cumulative cost that scales as √(log T), where T is the number of rounds. We also prove the first lower bound on the cumulative attack cost. Our lower bound matches our upper bound up to loglog T factors, showing our attack to be near optimal.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro