Streaming Algorithms for Stochastic Multi-armed Bandits

12/09/2020
by   Arnab Maiti, et al.
0

We study the Stochastic Multi-armed Bandit problem under bounded arm-memory. In this setting, the arms arrive in a stream, and the number of arms that can be stored in the memory at any time, is bounded. The decision-maker can only pull arms that are present in the memory. We address the problem from the perspective of two standard objectives: 1) regret minimization, and 2) best-arm identification. For regret minimization, we settle an important open question by showing an almost tight hardness. We show Ω(T^2/3) cumulative regret in expectation for arm-memory size of (n-1), where n is the number of arms. For best-arm identification, we study two algorithms. First, we present an O(r) arm-memory r-round adaptive streaming algorithm to find an ϵ-best arm. In r-round adaptive streaming algorithm for best-arm identification, the arm pulls in each round are decided based on the observed outcomes in the earlier rounds. The best-arm is the output at the end of r rounds. The upper bound on the sample complexity of our algorithm matches with the lower bound for any r-round adaptive streaming algorithm. Secondly, we present a heuristic to find the ϵ-best arm with optimal sample complexity, by storing only one extra arm in the memory.

READ FULL TEXT
research
06/13/2023

Tight Memory-Regret Lower Bounds for Streaming Bandits

In this paper, we investigate the streaming bandits problem, wherein the...
research
09/13/2022

Sample Complexity of an Adversarial Attack on UCB-based Best-arm Identification Policy

In this work I study the problem of adversarial perturbations to rewards...
research
11/19/2018

Best-arm identification with cascading bandits

We consider a variant of the problem of best arm identification in multi...
research
09/01/2023

Fast and Regret Optimal Best Arm Identification: Fundamental Limits and Low-Complexity Algorithms

This paper considers a stochastic multi-armed bandit (MAB) problem with ...
research
10/15/2020

Stochastic Bandits with Vector Losses: Minimizing ℓ^∞-Norm of Relative Losses

Multi-armed bandits are widely applied in scenarios like recommender sys...
research
06/06/2021

PAC Best Arm Identification Under a Deadline

We study (ϵ, δ)-PAC best arm identification, where a decision-maker must...
research
02/27/2015

Non-stochastic Best Arm Identification and Hyperparameter Optimization

Motivated by the task of hyperparameter optimization, we introduce the n...

Please sign up or login with your details

Forgot password? Click here to reset