Cooperative Multi-agent Bandits: Distributed Algorithms with Optimal Individual Regret and Constant Communication Costs

08/08/2023
by   Lin Yang, et al.
0

Recently, there has been extensive study of cooperative multi-agent multi-armed bandits where a set of distributed agents cooperatively play the same multi-armed bandit game. The goal is to develop bandit algorithms with the optimal group and individual regrets and low communication between agents. The prior work tackled this problem using two paradigms: leader-follower and fully distributed algorithms. Prior algorithms in both paradigms achieve the optimal group regret. The leader-follower algorithms achieve constant communication costs but fail to achieve optimal individual regrets. The state-of-the-art fully distributed algorithms achieve optimal individual regrets but fail to achieve constant communication costs. This paper presents a simple yet effective communication policy and integrates it into a learning algorithm for cooperative bandits. Our algorithm achieves the best of both paradigms: optimal individual regret and constant communication costs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2022

On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits

We consider the nonstochastic multi-agent multi-armed bandit problem wit...
research
03/08/2021

Provably Efficient Cooperative Multi-Agent Reinforcement Learning with Function Approximation

Reinforcement learning in cooperative multi-agent settings has recently ...
research
02/15/2023

On-Demand Communication for Asynchronous Multi-Agent Bandits

This paper studies a cooperative multi-agent multi-armed stochastic band...
research
06/08/2021

Cooperative Stochastic Multi-agent Multi-armed Bandits Robust to Adversarial Corruptions

We study the problem of stochastic bandits with adversarial corruptions ...
research
03/29/2021

Distributed learning in congested environments with partial information

How can non-communicating agents learn to share congested resources effi...
research
10/26/2015

A Parallel algorithm for X-Armed bandits

The target of X-armed bandit problem is to find the global maximum of an...
research
04/26/2016

Distributed Clustering of Linear Bandits in Peer to Peer Networks

We provide two distributed confidence ball algorithms for solving linear...

Please sign up or login with your details

Forgot password? Click here to reset