Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory

01/24/2019
by   Arghya Roy Chaudhuri, et al.
0

In this paper, we propose a constant word (RAM model) algorithm for regret minimisation for both finite and infinite Stochastic Multi-Armed Bandit (MAB) instances. Most of the existing regret minimisation algorithms need to remember the statistics of all the arms they encounter. This may become a problem for the cases where the number of available words of memory is limited. Designing an efficient regret minimisation algorithm that uses a constant number of words has long been interesting to the community. Some early attempts consider the number of arms to be infinite, and require the reward distribution of the arms to belong to some particular family. Recently, for finitely many-armed bandits an explore-then-commit based algorithm Liau+PSY:2018 seems to escape such assumption. However, due to the underlying PAC-based elimination their method incurs a high regret. We present a conceptually simple, and efficient algorithm that needs to remember statistics of at most M arms, and for any K-armed finite bandit instance it enjoys a O(KM +K^1.5√(T (T/MK))/M) upper-bound on regret. We extend it to achieve sub-linear quantile-regret RoyChaudhuri+K:2018 and empirically verify the efficiency of our algorithm via experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/17/2018

Correlated Multi-armed Bandits with a Latent Random Source

We consider a novel multi-armed bandit framework where the rewards obtai...
research
10/30/2022

Revisiting Simple Regret Minimization in Multi-Armed Bandits

Simple regret is a natural and parameter-free performance criterion for ...
research
04/30/2019

Risk-Averse Explore-Then-Commit Algorithms for Finite-Time Bandits

In this paper, we study multi-armed bandit problems in explore-then-comm...
research
03/03/2020

Bounded Regret for Finitely Parameterized Multi-Armed Bandits

We consider the problem of finitely parameterized multi-armed bandits wh...
research
05/22/2021

From Finite to Countable-Armed Bandits

We consider a stochastic bandit problem with countably many arms that be...
research
10/23/2020

Approximation Methods for Kernelized Bandits

The RKHS bandit problem (also called kernelized multi-armed bandit probl...
research
05/04/2023

Weighted Tallying Bandits: Overcoming Intractability via Repeated Exposure Optimality

In recommender system or crowdsourcing applications of online learning, ...

Please sign up or login with your details

Forgot password? Click here to reset