Understanding the Role of Feedback in Online Learning with Switching Costs

06/16/2023
by   Duo Cheng, et al.
0

In this paper, we study the role of feedback in online learning with switching costs. It has been shown that the minimax regret is Θ(T^2/3) under bandit feedback and improves to Θ(√(T)) under full-information feedback, where T is the length of the time horizon. However, it remains largely unknown how the amount and type of feedback generally impact regret. To this end, we first consider the setting of bandit learning with extra observations; that is, in addition to the typical bandit feedback, the learner can freely make a total of B_ex extra observations. We fully characterize the minimax regret in this setting, which exhibits an interesting phase-transition phenomenon: when B_ex = O(T^2/3), the regret remains Θ(T^2/3), but when B_ex = Ω(T^2/3), it becomes Θ(T/√(B_ex)), which improves as the budget B_ex increases. To design algorithms that can achieve the minimax regret, it is instructive to consider a more general setting where the learner has a budget of B total observations. We fully characterize the minimax regret in this setting as well and show that it is Θ(T/√(B)), which scales smoothly with the total budget B. Furthermore, we propose a generic algorithmic framework, which enables us to design different learning algorithms that can achieve matching upper bounds for both settings based on the amount and type of feedback. One interesting finding is that while bandit feedback can still guarantee optimal regret when the budget is relatively limited, it no longer suffices to achieve optimal regret when the budget is relatively large.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/05/2018

Online learning over a finite action set with limited switching

This paper studies the value of switching actions in the Prediction From...
research
06/16/2021

Banker Online Mirror Descent

We propose Banker-OMD, a novel framework generalizing the classical Onli...
research
03/23/2021

Bandit Learning for Dynamic Colonel Blotto Game with a Budget Constraint

We consider a dynamic Colonel Blotto game (CBG) in which one of the play...
research
06/30/2020

Provably More Efficient Q-Learning in the Full-Feedback/One-Sided-Feedback Settings

We propose two new Q-learning algorithms, Full-Q-Learning (FQL) and Elim...
research
03/01/2019

Regret Minimisation in Multinomial Logit Bandits

We consider two regret minimisation problems over subsets of a finite gr...
research
05/15/2022

Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback

Motivated by applications to online learning in sparse estimation and Ba...
research
06/03/2021

Bandit Phase Retrieval

We study a bandit version of phase retrieval where the learner chooses a...

Please sign up or login with your details

Forgot password? Click here to reset