Finite Continuum-Armed Bandits

10/23/2020
by   Solenne Gaucher, et al.
0

We consider a situation where an agent has T ressources to be allocated to a larger number N of actions. Each action can be completed at most once and results in a stochastic reward with unknown mean. The goal of the agent is to maximize her cumulative reward. Non trivial strategies are possible when side information on the actions is available, for example in the form of covariates. Focusing on a nonparametric setting, where the mean reward is an unknown function of a one-dimensional covariate, we propose an optimal strategy for this problem. Under natural assumptions on the reward function, we prove that the optimal regret scales as O(T^1/3) up to poly-logarithmic factors when the budget T is proportional to the number of actions N. When T becomes small compared to N, a smooth transition occurs. When the ratio T/N decreases from a constant to N^-1/3, the regret increases progressively up to the O(T^1/2) rate encountered in continuum-armed bandits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2019

How to gamble with non-stationary X-armed bandits and have no regrets

In X-armed bandit problem an agent sequentially interacts with environme...
research
08/10/2020

Lenient Regret for Multi-Armed Bandits

We consider the Multi-Armed Bandit (MAB) problem, where the agent sequen...
research
03/06/2020

A Farewell to Arms: Sequential Reward Maximization on a Budget with a Giving Up Option

We consider a sequential decision-making problem where an agent can take...
research
10/23/2018

Unifying the stochastic and the adversarial Bandits with Knapsack

This paper investigates the adversarial Bandits with Knapsack (BwK) onli...
research
02/16/2021

Making the most of your day: online learning for optimal allocation of time

We study online learning for optimal allocation when the resource to be ...
research
11/22/2018

Bandits with Temporal Stochastic Constraints

We study the effect of impairment on stochastic multi-armed bandits and ...
research
05/25/2023

Small Total-Cost Constraints in Contextual Bandits with Knapsacks, with Application to Fairness

We consider contextual bandit problems with knapsacks [CBwK], a problem ...

Please sign up or login with your details

Forgot password? Click here to reset