Decentralized Multi-Armed Bandit Can Outperform Classic Upper Confidence Bound

11/22/2021
by   Jingxuan Zhu, et al.
0

This paper studies a decentralized multi-armed bandit problem in a multi-agent network. The problem is simultaneously solved by N agents assuming they face a common set of M arms and share the same mean of each arm's reward. Each agent can receive information only from its neighbors, where the neighbor relations among the agents are described by a directed graph whose vertices represent agents and whose directed edges depict neighbor relations. A fully decentralized multi-armed bandit algorithm is proposed for each agent, which twists the classic consensus algorithm and upper confidence bound (UCB) algorithm. It is shown that the algorithm guarantees each agent to achieve a better logarithmic asymptotic regret than the classic UCB provided the neighbor graph is strongly connected. The regret can be further improved if the neighbor graph is undirected.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/10/2018

Decentralized Cooperative Stochastic Multi-armed Bandits

We study a decentralized cooperative stochastic multi-armed bandit probl...
research
10/24/2020

Federated Bandit: A Gossiping Approach

In this paper, we study Federated Bandit, a decentralized Multi-Armed Ba...
research
03/29/2021

Distributed learning in congested environments with partial information

How can non-communicating agents learn to share congested resources effi...
research
07/20/2023

Decentralized Smart Charging of Large-Scale EVs using Adaptive Multi-Agent Multi-Armed Bandits

The drastic growth of electric vehicles and photovoltaics can introduce ...
research
04/09/2019

A Note on the Equivalence of Upper Confidence Bounds and Gittins Indices for Patient Agents

This note gives a short, self-contained, proof of a sharp connection bet...
research
02/10/2023

Piecewise-Stationary Multi-Objective Multi-Armed Bandit with Application to Joint Communications and Sensing

We study a multi-objective multi-armed bandit problem in a dynamic envir...

Please sign up or login with your details

Forgot password? Click here to reset