Multi-Armed Bandits with Censored Consumption of Resources

11/02/2020
by   Viktor Bengs, et al.
0

We consider a resource-aware variant of the classical multi-armed bandit problem: In each round, the learner selects an arm and determines a resource limit. It then observes a corresponding (random) reward, provided the (random) amount of consumed resources remains below the limit. Otherwise, the observation is censored, i.e., no reward is obtained. For this problem setting, we introduce a measure of regret, which incorporates the actual amount of allocated resources of each learning round as well as the optimality of realizable rewards. Thus, to minimize regret, the learner needs to set a resource limit and choose an arm in such a way that the chance to realize a high reward within the predefined resource limit is high, while the resource limit itself should be kept as low as possible. We derive the theoretical lower bound on the cumulative regret and propose a learning algorithm having a regret upper bound that matches the lower bound. In a simulation study, we show that our learning algorithm outperforms straightforward extensions of standard multi-armed bandit algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2023

Multi-Fidelity Multi-Armed Bandits Revisited

We study the multi-fidelity multi-armed bandit (MF-MAB), an extension of...
research
09/20/2017

Bandits with Delayed Anonymous Feedback

We study the bandits with delayed anonymous feedback problem, a variant ...
research
05/28/2021

Asymptotically Optimal Bandits under Weighted Information

We study the problem of regret minimization in a multi-armed bandit setu...
research
11/11/2021

Solving Multi-Arm Bandit Using a Few Bits of Communication

The multi-armed bandit (MAB) problem is an active learning framework tha...
research
04/25/2023

Communication-Constrained Bandits under Additive Gaussian Noise

We study a distributed stochastic multi-armed bandit where a client supp...
research
08/04/2015

Staged Multi-armed Bandits

In this paper, we introduce a new class of reinforcement learning method...
research
12/13/2021

Top K Ranking for Multi-Armed Bandit with Noisy Evaluations

We consider a multi-armed bandit setting where, at the beginning of each...

Please sign up or login with your details

Forgot password? Click here to reset