Making the most of your day: online learning for optimal allocation of time

02/16/2021
by   Etienne Boursier, et al.
0

We study online learning for optimal allocation when the resource to be allocated is time. Examples of possible applications include a driver filling a day with rides, a landlord renting an estate, etc. Following our initial motivation, a driver receives ride proposals sequentially according to a Poisson process and can either accept or reject a proposed ride. If she accepts the proposal, she is busy for the duration of the ride and obtains a reward that depends on the ride duration. If she rejects it, she remains on hold until a new ride proposal arrives. We study the regret incurred by the driver first when she knows her reward function but does not know the distribution of the ride duration, and then when she does not know her reward function, either. Faster rates are finally obtained by adding structural assumptions on the distribution of rides or on the reward function. This natural setting bears similarities with contextual (one-armed) bandits, but with the crucial difference that the normalized reward associated to a context depends on the whole distribution of contexts.

READ FULL TEXT

page 17

page 18

page 19

research
04/28/2020

Pitfalls of learning a reward function online

In some agent designs like inverse reinforcement learning an agent needs...
research
09/26/2013

Finite-Time Analysis of Kernelised Contextual Bandits

We tackle the problem of online reward maximisation over a large finite ...
research
07/07/2021

Neural Contextual Bandits without Regret

Contextual bandits are a rich model for sequential decision making given...
research
10/23/2020

Finite Continuum-Armed Bandits

We consider a situation where an agent has T ressources to be allocated ...
research
10/19/2021

Regret Minimization in Isotonic, Heavy-Tailed Contextual Bandits via Adaptive Confidence Bands

In this paper we initiate a study of non parametric contextual bandits u...
research
09/28/2022

Online Subset Selection using α-Core with no Augmented Regret

We consider the problem of sequential sparse subset selections in an onl...
research
10/12/2017

Identifying On-time Reward Delivery Projects with Estimating Delivery Duration on Kickstarter

In Crowdfunding platforms, people turn their prototype ideas into real p...

Please sign up or login with your details

Forgot password? Click here to reset