Decentralized Multi-Armed Bandit Can Outperform Classic Upper Confidence Bound

by   Jingxuan Zhu, et al.

This paper studies a decentralized multi-armed bandit problem in a multi-agent network. The problem is simultaneously solved by N agents assuming they face a common set of M arms and share the same mean of each arm's reward. Each agent can receive information only from its neighbors, where the neighbor relations among the agents are described by a directed graph whose vertices represent agents and whose directed edges depict neighbor relations. A fully decentralized multi-armed bandit algorithm is proposed for each agent, which twists the classic consensus algorithm and upper confidence bound (UCB) algorithm. It is shown that the algorithm guarantees each agent to achieve a better logarithmic asymptotic regret than the classic UCB provided the neighbor graph is strongly connected. The regret can be further improved if the neighbor graph is undirected.


page 1

page 2

page 3

page 4


Decentralized Cooperative Stochastic Multi-armed Bandits

We study a decentralized cooperative stochastic multi-armed bandit probl...

Federated Bandit: A Gossiping Approach

In this paper, we study Federated Bandit, a decentralized Multi-Armed Ba...

Distributed learning in congested environments with partial information

How can non-communicating agents learn to share congested resources effi...

Decentralized Smart Charging of Large-Scale EVs using Adaptive Multi-Agent Multi-Armed Bandits

The drastic growth of electric vehicles and photovoltaics can introduce ...

A Note on the Equivalence of Upper Confidence Bounds and Gittins Indices for Patient Agents

This note gives a short, self-contained, proof of a sharp connection bet...

Piecewise-Stationary Multi-Objective Multi-Armed Bandit with Application to Joint Communications and Sensing

We study a multi-objective multi-armed bandit problem in a dynamic envir...

Please sign up or login with your details

Forgot password? Click here to reset