Bandits with Feedback Graphs and Switching Costs

07/29/2019
by   Raman Arora, et al.
4

We study the adversarial multi-armed bandit problem where partial observations are available and where, in addition to the loss incurred for each action, a switching cost is incurred for shifting to a new action. All previously known results incur a factor proportional to the independence number of the feedback graph. We give a new algorithm whose regret guarantee depends only on the domination number of the graph. We further supplement that result with a lower bound. Finally, we also give a new algorithm with improved policy regret bounds when partial counterfactual feedback is available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2018

Online learning with feedback graphs and switching costs

We study online learning when partial feedback information is provided f...
research
05/26/2019

Phase Transitions and Cyclic Phenomena in Bandits with Switching Constraints

We consider the classical stochastic multi-armed bandit problem with a c...
research
06/05/2023

Online Learning with Feedback Graphs: The True Shape of Regret

Sequential learning with feedback graphs is a natural extension of the m...
research
07/17/2013

From Bandits to Experts: A Tale of Domination and Independence

We consider the partial observability model for multi-armed bandits, int...
research
05/23/2018

Cleaning up the neighborhood: A full classification for adversarial partial monitoring

Partial monitoring is a generalization of the well-known multi-armed ban...
research
02/14/2012

Graphical Models for Bandit Problems

We introduce a rich class of graphical models for multi-armed bandit pro...
research
03/09/2019

Linear Bandits with Feature Feedback

This paper explores a new form of the linear bandit problem in which the...

Please sign up or login with your details

Forgot password? Click here to reset