Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits

07/07/2019
by   Yogev Bar-On, et al.
0

We study agents communicating over an underlying network by exchanging messages, in order to optimize their individual regret in a common nonstochastic multi-armed bandit problem. We derive regret minimization algorithms that guarantee for each agent v an individual expected regret of O(√((1+K/|N(v)|)T)), where T is the number of time steps, K is the number of actions and N(v) is the set of neighbors of agent v in the communication graph. We present algorithms both for the case that the communication graph is known to all the agents, and for the case that the graph is unknown. When the graph is unknown, each agent knows only the set of its neighbors and an upper bound on the total number of agents. The individual regret between the models differs only by a logarithmic factor. Our work resolves an open problem from [Cesa-Bianchi et al., 2019b].

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2022

On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits

We consider the nonstochastic multi-agent multi-armed bandit problem wit...
research
10/07/2019

An Option and Agent Selection Policy with Logarithmic Regret for Multi Agent Multi Armed Bandit Problems on Random Graphs

Existing studies of the Multi Agent Multi Armed Bandit (MAMAB) problem, ...
research
07/22/2011

Robustness of Anytime Bandit Policies

This paper studies the deviations of the regret in a stochastic multi-ar...
research
06/06/2018

Finding the Bandit in a Graph: Sequential Search-and-Stop

We consider the problem where an agent wants to find a hidden object tha...
research
09/20/2021

Asymptotic Optimality for Decentralised Bandits

We consider a large number of agents collaborating on a multi-armed band...
research
02/10/2022

Remote Contextual Bandits

We consider a remote contextual multi-armed bandit (CMAB) problem, in wh...

Please sign up or login with your details

Forgot password? Click here to reset