Online learning over a finite action set with limited switching

by   Jason Altschuler, et al.

This paper studies the value of switching actions in the Prediction From Experts (PFE) problem and Adversarial Multi-Armed Bandits (MAB) problem. First, we revisit the well-studied and practically motivated setting of PFE with switching costs. Many algorithms are known to achieve the minimax optimal order of O(√(T n)) in expectation for both regret and number of switches, where T is the number of iterations and n the number of actions. However, no high probability (h.p.) guarantees are known. Our main technical contribution is the first algorithms which with h.p. achieve this optimal order for both regret and switches. This settles an open problem of [Devroye et al., 2015], and directly implies the first h.p. guarantees for several problems of interest. Next, to investigate the value of switching actions at a more granular level, we introduce the setting of switching budgets, in which algorithms are limited to S ≤ T switches between actions. This entails a limited number of free switches, in contrast to the unlimited number of expensive switches in the switching cost setting. Using the above result and several reductions, we unify previous work and completely characterize the complexity of this switching budget setting up to small polylogarithmic factors: for both PFE and MAB, for all switching budgets S ≤ T, and for both expectation and h.p. guarantees. For PFE, we show the optimal rate is Θ̃(√(T n)) for S = Ω(√(T n)), and (Θ̃(T nS), T) for S = O(√(T n)). Interestingly, the bandit setting does not exhibit such a phase transition; instead we show the minimax rate decays steadily as (Θ̃(T√(n)√(S)), T) for all ranges of S ≤ T. These results recover and generalize the known minimax rates for the (arbitrary) switching cost setting.


page 1

page 2

page 3

page 4


Online learning with feedback graphs and switching costs

We study online learning when partial feedback information is provided f...

Understanding the Role of Feedback in Online Learning with Switching Costs

In this paper, we study the role of feedback in online learning with swi...

Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition

We study the problem of switching-constrained online convex optimization...

Multinomial Logit Bandit with Low Switching Cost

We study multinomial logit bandit with limited adaptivity, where the alg...

Better Best of Both Worlds Bounds for Bandits with Switching Costs

We study best-of-both-worlds algorithms for bandits with switching cost,...

Equipping Experts/Bandits with Long-term Memory

We propose the first reduction-based approach to obtaining long-term mem...

Anomaly Search Over Many Sequences With Switching Costs

This paper considers the quickest search problem to identify anomalies a...

Please sign up or login with your details

Forgot password? Click here to reset