Generalizing distribution of partial rewards for multi-armed bandits with temporally-partitioned rewards

11/13/2022
by   Ronald C. van den Broek, et al.
0

We investigate the Multi-Armed Bandit problem with Temporally-Partitioned Rewards (TP-MAB) setting in this paper. In the TP-MAB setting, an agent will receive subsets of the reward over multiple rounds rather than the entire reward for the arm all at once. In this paper, we introduce a general formulation of how an arm's cumulative reward is distributed across several rounds, called Beta-spread property. Such a generalization is needed to be able to handle partitioned rewards in which the maximum reward per round is not distributed uniformly across rounds. We derive a lower bound on the TP-MAB problem under the assumption that Beta-spread holds. Moreover, we provide an algorithm TP-UCB-FR-G, which uses the Beta-spread property to improve the regret upper bound in some scenarios. By generalizing how the cumulative reward is distributed, this setting is applicable in a broader range of applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2022

Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts

There is a rising interest in industrial online applications where data ...
research
03/01/2023

Multi-Armed Bandits with Generalized Temporally-Partitioned Rewards

Decision-making problems of sequential nature, where decisions made in t...
research
12/13/2021

Top K Ranking for Multi-Armed Bandit with Noisy Evaluations

We consider a multi-armed bandit setting where, at the beginning of each...
research
01/13/2022

Contextual Bandits for Advertising Campaigns: A Diffusion-Model Independent Approach (Extended Version)

Motivated by scenarios of information diffusion and advertising in socia...
research
06/30/2016

Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits

We study a generalization of the multi-armed bandit problem with multipl...
research
08/21/2020

Near Optimal Adversarial Attack on UCB Bandits

We consider a stochastic multi-arm bandit problem where rewards are subj...
research
03/01/2023

Containing a spread through sequential learning: to exploit or to explore?

The spread of an undesirable contact process, such as an infectious dise...

Please sign up or login with your details

Forgot password? Click here to reset