Multi-armed Bandits with Compensation

11/05/2018
by   Siwei Wang, et al.
0

We propose and study the known-compensation multi-arm bandit (KCMAB) problem, where a system controller offers a set of arms to many short-term players for T steps. In each step, one short-term player arrives to the system. Upon arrival, the player aims to select an arm with the current best average reward and receives a stochastic reward associated with the arm. In order to incentivize players to explore other arms, the controller provides a proper payment compensation to players. The objective of the controller is to maximize the total reward collected by players while minimizing the compensation. We first provide a compensation lower bound Θ(∑_i Δ_i T KL_i), where Δ_i and KL_i are the expected reward gap and Kullback-Leibler (KL) divergence between distributions of arm i and the best arm, respectively. We then analyze three algorithms to solve the KCMAB problem, and obtain their regrets and compensations. We show that the algorithms all achieve O( T) regret and O( T) compensation that match the theoretical lower bound. Finally, we present experimental results to demonstrate the performance of the algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2022

Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits

Multi-player multi-armed bandit is an increasingly relevant decision-mak...
research
02/04/2019

New Algorithms for Multiplayer Bandits when Arm Means Vary Among Players

We study multiplayer stochastic multi-armed bandit problems in which the...
research
08/03/2019

Multiplayer Bandit Learning, from Competition to Cooperation

The stochastic multi-armed bandit problem is a classic model illustratin...
research
11/15/2022

Multi-Player Bandits Robust to Adversarial Collisions

Motivated by cognitive radios, stochastic Multi-Player Multi-Armed Bandi...
research
09/06/2022

Multi-Armed Bandits with Self-Information Rewards

This paper introduces the informational multi-armed bandit (IMAB) model ...
research
04/08/2021

Incentivizing Exploration in Linear Bandits under Information Gap

We study the problem of incentivizing exploration for myopic users in li...
research
02/19/2022

The Pareto Frontier of Instance-Dependent Guarantees in Multi-Player Multi-Armed Bandits with no Communication

We study the stochastic multi-player multi-armed bandit problem. In this...

Please sign up or login with your details

Forgot password? Click here to reset