Online Learning for Cooperative Multi-Player Multi-Armed Bandits

09/07/2021
by   William Chang, et al.
0

We introduce a framework for decentralized online learning for multi-armed bandits (MAB) with multiple cooperative players. The reward obtained by the players in each round depends on the actions taken by all the players. It's a team setting, and the objective is common. Information asymmetry is what makes the problem interesting and challenging. We consider three types of information asymmetry: action information asymmetry when the actions of the players can't be observed but the rewards received are common; reward information asymmetry when the actions of the other players are observable but rewards received are IID from the same distribution; and when we have both action and reward information asymmetry. For the first setting, we propose a UCB-inspired algorithm that achieves O(log T) regret whether the rewards are IID or Markovian. For the second section, we offer an environment such that the algorithm given for the first setting gives linear regret. For the third setting, we show that a variation of the `explore then commit' algorithm achieves almost log regret.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/26/2018

Game of Thrones: Fully Distributed Learning for Multi-Player Bandits

We consider a multi-armed bandit game where N players compete for M arms...
research
01/27/2023

Decentralized Online Bandit Optimization on Directed Graphs with Regret Bounds

We consider a decentralized multiplayer game, played over T rounds, with...
research
10/23/2018

Unifying the stochastic and the adversarial Bandits with Knapsack

This paper investigates the adversarial Bandits with Knapsack (BwK) onli...
research
08/04/2015

Staged Multi-armed Bandits

In this paper, we introduce a new class of reinforcement learning method...
research
10/22/2019

Restless Hidden Markov Bandits with Linear Rewards

This paper presents an algorithm and regret analysis for the restless hi...
research
08/21/2013

Distributed Online Learning via Cooperative Contextual Bandits

In this paper we propose a novel framework for decentralized, online lea...
research
06/13/2011

From Bandits to Experts: On the Value of Side-Observations

We consider an adversarial online learning setting where a decision make...

Please sign up or login with your details

Forgot password? Click here to reset