On the Performance of Thompson Sampling on Logistic Bandits

05/12/2019
by   Shi Dong, et al.
10

We study the logistic bandit, in which rewards are binary with success probability (β a^θ) / (1 + (β a^θ)) and actions a and coefficients θ are within the d-dimensional unit ball. While prior regret bounds for algorithms that address the logistic bandit exhibit exponential dependence on the slope parameter β, we establish a regret bound for Thompson sampling that is independent of β. Specifically, we establish that, when the set of feasible actions is identical to the set of possible coefficient vectors, the Bayesian regret of Thompson sampling is Õ(d√(T)). We also establish a Õ(√(dη T)/λ) bound that applies more broadly, where λ is the worst-case optimal log-odds and η is the "fragility dimension," a new statistic we define to capture the degree to which an optimal action for one model fails to satisfice for others. We demonstrate that the fragility dimension plays an essential role by showing that, for any ϵ > 0, no algorithm can achieve poly(d, 1/λ)· T^1-ϵ regret.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2018

An Information-Theoretic Analysis for Thompson Sampling with Many Actions

Information-theoretic Bayesian regret bounds of Russo and Van Roy captur...
research
05/30/2018

An Information-Theoretic Analysis of Thompson Sampling for Large Action Spaces

Information-theoretic Bayesian regret bounds of Russo and Van Roy captur...
research
06/07/2020

Thompson Sampling for Multinomial Logit Contextual Bandits

We consider a dynamic assortment selection problem where the goal is to ...
research
02/12/2020

A General Framework to Analyze Stochastic Linear Bandit

In this paper we study the well-known stochastic linear bandit problem w...
research
10/07/2020

Effects of Model Misspecification on Bayesian Bandits: Case Studies in UX Optimization

Bayesian bandits using Thompson Sampling have seen increasing success in...
research
03/07/2018

Satisficing in Time-Sensitive Bandit Learning

Much of the recent literature on bandit learning focuses on algorithms t...
research
09/28/2018

Efficient Linear Bandits through Matrix Sketching

We prove that two popular linear contextual bandit algorithms, OFUL and ...

Please sign up or login with your details

Forgot password? Click here to reset