Breaking the √(T) Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits

05/19/2022
by   Avishek Ghosh, et al.
12

We prove an instance independent (poly) logarithmic regret for stochastic contextual bandits with linear payoff. Previously, in <cit.>, a lower bound of 𝒪(√(T)) is shown for the contextual linear bandit problem with arbitrary (adversarily chosen) contexts. In this paper, we show that stochastic contexts indeed help to reduce the regret from √(T) to (T). We propose Low Regret Stochastic Contextual Bandits (), which takes advantage of the stochastic contexts and performs parameter estimation (in ℓ_2 norm) and regret minimization simultaneously. works in epochs, where the parameter estimation of the previous epoch is used to reduce the regret of the current epoch. The (poly) logarithmic regret of stems from two crucial facts: (a) the application of a norm adaptive algorithm to exploit the parameter estimation and (b) an analysis of the shifted linear contextual bandit algorithm, showing that shifting results in increasing regret. We have also shown experimentally that stochastic contexts indeed incurs a regret that scales with (T).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/05/2020

Stochastic Linear Contextual Bandits with Diverse Contexts

In this paper, we investigate the impact of context diversity on stochas...
research
01/28/2019

Target Tracking for Contextual Bandits: Application to Demand Side Management

We propose a contextual-bandit approach for demand side management by of...
research
05/02/2023

Stochastic Contextual Bandits with Graph-based Contexts

We naturally generalize the on-line graph prediction problem to a versio...
research
10/15/2019

Adaptive Exploration in Linear Contextual Bandit

Contextual bandits serve as a fundamental model for many sequential deci...
research
06/11/2020

Bandits with Partially Observable Offline Data

We study linear contextual bandits with access to a large, partially obs...
research
07/11/2016

Kernel-based methods for bandit convex optimization

We consider the adversarial convex bandit problem and we build the first...
research
03/04/2020

Taking a hint: How to leverage loss predictors in contextual bandits?

We initiate the study of learning in contextual bandits with the help of...

Please sign up or login with your details

Forgot password? Click here to reset