Solving Multi-Arm Bandit Using a Few Bits of Communication

11/11/2021
by   Osama A. Hanna, et al.
0

The multi-armed bandit (MAB) problem is an active learning framework that aims to select the best among a set of actions by sequentially observing rewards. Recently, it has become popular for a number of applications over wireless networks, where communication constraints can form a bottleneck. Existing works usually fail to address this issue and can become infeasible in certain applications. In this paper we address the communication problem by optimizing the communication of rewards collected by distributed agents. By providing nearly matching upper and lower bounds, we tightly characterize the number of bits needed per reward for the learner to accurately learn without suffering additional regret. In particular, we establish a generic reward quantization algorithm, QuBan, that can be applied on top of any (no-regret) MAB algorithm to form a new communication-efficient counterpart, that requires only a few (as low as 3) bits to be sent per iteration while preserving the same regret bound. Our lower bound is established via constructing hard instances from a subgaussian distribution. Our theory is further corroborated by numerically experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2023

Regret Lower Bounds in Multi-agent Multi-armed Bandit

Multi-armed Bandit motivates methods with provable upper bounds on regre...
research
11/02/2020

Multi-Armed Bandits with Censored Consumption of Resources

We consider a resource-aware variant of the classical multi-armed bandit...
research
10/29/2020

Multitask Bandit Learning through Heterogeneous Feedback Aggregation

In many real-world applications, multiple agents seek to learn how to pe...
research
04/25/2023

Communication-Constrained Bandits under Additive Gaussian Noise

We study a distributed stochastic multi-armed bandit where a client supp...
research
02/10/2023

Piecewise-Stationary Multi-Objective Multi-Armed Bandit with Application to Joint Communications and Sensing

We study a multi-objective multi-armed bandit problem in a dynamic envir...
research
04/30/2023

ICQ: A Quantization Scheme for Best-Arm Identification Over Bit-Constrained Channels

We study the problem of best-arm identification in a distributed variant...
research
12/02/2018

Quick Best Action Identification in Linear Bandit Problems

In this paper, we consider a best action identification problem in the s...

Please sign up or login with your details

Forgot password? Click here to reset