Context-lumpable stochastic bandits

06/22/2023
by   Chung-Wei Lee, et al.
0

We consider a contextual bandit problem with S contexts and A actions. In each round t=1,2,… the learner observes a random context and chooses an action based on its past experience. The learner then observes a random reward whose mean is a function of the context and the action for the round. Under the assumption that the contexts can be lumped into r≤min{S ,A } groups such that the mean reward for the various actions is the same for any two contexts that are in the same group, we give an algorithm that outputs an ϵ-optimal policy after using at most O(r (S +A )/ϵ^2) samples with high probability and provide a matching Ω(r (S +A )/ϵ^2) lower bound. In the regret minimization setting, we give an algorithm whose cumulative regret up to time T is bounded by O(√(r^3(S +A )T)). To the best of our knowledge, we are the first to show the near-optimal sample complexity in the PAC setting and O(√(poly(r)(S+K)T)) minimax regret in the online setting for this problem. We also show our algorithms can be applied to more general low-rank bandits and get improved regret bounds in some scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/25/2018

Contextual Bandits with Cross-learning

In the classical contextual bandits problem, in each round t, a learner ...
research
07/05/2022

Instance-optimal PAC Algorithms for Contextual Bandits

In the stochastic contextual bandit setting, regret-minimizing algorithm...
research
06/13/2011

Efficient Optimal Learning for Contextual Bandits

We address the problem of learning in an online setting where the learne...
research
06/07/2022

Group Meritocratic Fairness in Linear Contextual Bandits

We study the linear contextual bandit problem where an agent has to sele...
research
12/02/2018

Quick Best Action Identification in Linear Bandit Problems

In this paper, we consider a best action identification problem in the s...
research
01/31/2023

Quantum contextual bandits and recommender systems for quantum data

We study a recommender system for quantum data using the linear contextu...
research
07/24/2023

Contextual Bandits and Imitation Learning via Preference-Based Active Queries

We consider the problem of contextual bandits and imitation learning, wh...

Please sign up or login with your details

Forgot password? Click here to reset