Crush Optimism with Pessimism: Structured Bandits Beyond Asymptotic Optimality

by   Kwang-Sung Jun, et al.

We study stochastic structured bandits for minimizing regret. The fact that the popular optimistic algorithms do not achieve the asymptotic instance-dependent regret optimality (asymptotic optimality for short) has recently allured researchers. On the other hand, it is known that one can achieve a bounded regret (i.e., does not grow indefinitely with n) in certain instances. Unfortunately, existing asymptotically optimal algorithms rely on forced sampling that introduces an ω(1) term w.r.t. the time horizon n in their regret, failing to adapt to the “easiness” of the instance. In this paper, we focus on the finite hypothesis class and ask if one can achieve the asymptotic optimality while enjoying bounded regret whenever possible. We provide a positive answer by introducing a new algorithm called CRush Optimism with Pessimism (CROP) that eliminates optimistic hypotheses by pulling the informative arms indicated by a pessimistic hypothesis. Our finite-time analysis shows that CROP (i) achieves a constant-factor asymptotic optimality and, thanks to the forced-exploration-free design, (ii) adapts to bounded regret, and (iii) its regret bound scales not with the number of arms K but with an effective number of arms K_ψ that we introduce. We also discuss a problem class where CROP can be exponentially better than existing algorithms in nonasymptotic regimes. Finally, we observe that even a clairvoyant oracle who plays according to the asymptotically optimal arm pull scheme may suffer a linear worst-case regret, indicating that it may not be the end of optimism.


page 1

page 2

page 3

page 4


Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

We study the regret of Thompson sampling (TS) algorithms for exponential...

A Novel Confidence-Based Algorithm for Structured Bandits

We study finite-armed stochastic bandits where the rewards of each arm m...

The Symmetry between Arms and Knapsacks: A Primal-Dual Approach for Bandits with Knapsacks

In this paper, we study the bandits with knapsacks (BwK) problem and dev...

Instance-Optimality in Interactive Decision Making: Toward a Non-Asymptotic Theory

We consider the development of adaptive, instance-dependent algorithms f...

From Optimality to Robustness: Dirichlet Sampling Strategies in Stochastic Bandits

The stochastic multi-arm bandit problem has been extensively studied und...

An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

In the contextual linear bandit setting, algorithms built on the optimis...

Structure Adaptive Algorithms for Stochastic Bandits

We study reward maximisation in a wide class of structured stochastic mu...

Please sign up or login with your details

Forgot password? Click here to reset