From Bandits to Experts: A Tale of Domination and Independence

07/17/2013
by   Noga Alon, et al.
0

We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir. Our main result is a characterization of regret in the directed observability model in terms of the dominating and independence numbers of the observability graph. We also show that in the undirected case, the learner can achieve optimal regret without even accessing the observability graph before selecting an action. Both results are shown using variants of the Exp3 algorithm operating on the observability graph in a time-efficient manner.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2018

Analysis of Thompson Sampling for Graphical Bandits Without the Graphs

We study multi-armed bandit problems with graph feedback, in which the d...
research
07/29/2019

Bandits with Feedback Graphs and Switching Costs

We study the adversarial multi-armed bandit problem where partial observ...
research
12/10/2020

Adversarial Linear Contextual Bandits with Graph-Structured Side Observations

This paper studies the adversarial graphical contextual bandits, a varia...
research
02/28/2022

Robust Multi-Agent Bandits Over Undirected Graphs

We consider a multi-agent multi-armed bandit setting in which n honest a...
research
07/12/2019

Gittins' theorem under uncertainty

We study dynamic allocation problems for discrete time multi-armed bandi...
research
06/13/2011

From Bandits to Experts: On the Value of Side-Observations

We consider an adversarial online learning setting where a decision make...
research
06/03/2023

Incentivizing Exploration with Linear Contexts and Combinatorial Actions

We advance the study of incentivized bandit exploration, in which arm ch...

Please sign up or login with your details

Forgot password? Click here to reset