Predictive Bandits

04/02/2020
by   Simon Lindståhl, et al.
0

We introduce and study a new class of stochastic bandit problems, referred to as predictive bandits. In each round, the decision maker first decides whether to gather information about the rewards of particular arms (so that their rewards in this round can be predicted). These measurements are costly, and may be corrupted by noise. The decision maker then selects an arm to be actually played in the round. Predictive bandits find applications in many areas; e.g. they can be applied to channel selection problems in radio communication systems. In this paper, we provide the first theoretical results about predictive bandits, and focus on scenarios where the decision maker is allowed to measure at most one arm per round. We derive asymptotic instance-specific regret lower bounds for these problems, and develop algorithms whose regret match these fundamental limits. We illustrate the performance of our algorithms through numerical experiments. In particular, we highlight the gains that can be achieved by using reward predictions, and investigate the impact of the noise in the corresponding measurements.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/20/2021

Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions

The contextual combinatorial semi-bandit problem with linear payoff func...
research
11/19/2018

Best-arm identification with cascading bandits

We consider a variant of the problem of best arm identification in multi...
research
01/19/2023

Decision-Focused Evaluation: Analyzing Performance of Deployed Restless Multi-Arm Bandits

Restless multi-arm bandits (RMABs) is a popular decision-theoretic frame...
research
05/20/2014

Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms

We consider stochastic multi-armed bandits where the expected reward is ...
research
11/01/2017

Minimal Exploration in Structured Stochastic Bandits

This paper introduces and addresses a wide class of stochastic bandit pr...
research
11/27/2018

Rotting bandits are no harder than stochastic ones

In bandits, arms' distributions are stationary. This is often violated i...
research
06/01/2022

Contextual Bandits with Knapsacks for a Conversion Model

We consider contextual bandits with knapsacks, with an underlying struct...

Please sign up or login with your details

Forgot password? Click here to reset