Better Best of Both Worlds Bounds for Bandits with Switching Costs
We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouyer, Seldin and Cesa-Bianchi, 2021. We introduce a surprisingly simple and effective algorithm that simultaneously achieves minimax optimal regret bound of đĒ(T^2/3) in the oblivious adversarial setting and a bound of đĒ(min{log (T)/Î^2,T^2/3}) in the stochastically-constrained regime, both with (unit) switching costs, where Î is the gap between the arms. In the stochastically constrained case, our bound improves over previous results due to Rouyer et al., that achieved regret of đĒ(T^1/3/Î). We accompany our results with a lower bound showing that, in general, ΊĖ(min{1/Î^2,T^2/3}) regret is unavoidable in the stochastically-constrained case for algorithms with đĒ(T^2/3) worst-case regret.
READ FULL TEXT