Bandits with Feedback Graphs and Switching Costs

07/29/2019
by   Raman Arora, et al.
4

We study the adversarial multi-armed bandit problem where partial observations are available and where, in addition to the loss incurred for each action, a switching cost is incurred for shifting to a new action. All previously known results incur a factor proportional to the independence number of the feedback graph. We give a new algorithm whose regret guarantee depends only on the domination number of the graph. We further supplement that result with a lower bound. Finally, we also give a new algorithm with improved policy regret bounds when partial counterfactual feedback is available.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro