Thompson Sampling for Cascading Bandits

10/02/2018
by   Wang Chi Cheung, et al.
0

We design and analyze TS-Cascade, a Thompson sampling algorithm for the cascading bandit problem. In TS-Cascade, Bayesian estimates of the click probability are constructed using a univariate Gaussian; this leads to a more efficient exploration procedure vis-à-vis existing UCB-based approaches. We also incorporate the empirical variance of each item's click probability into the Bayesian updates. These two novel features allow us to prove an expected regret bound of the form Õ(√(KLT)) where L and K are the number of ground items and the number of items in the chosen list respectively and T> L is the number of Thompson sampling update steps. This matches the state-of-the-art regret bounds for UCB-based algorithms. More importantly, it is the first theoretical guarantee on a Thompson sampling algorithm for any stochastic combinatorial bandit problem model with partial feedback. Empirical experiments demonstrate superiority of TS-Cascade compared to existing UCB-based procedures in terms of the expected cumulative regret and the time complexity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2014

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

A stochastic combinatorial semi-bandit is an online learning problem whe...
research
03/17/2016

Cascading Bandits for Large-Scale Recommendation Problems

Most recommender systems recommend a list of items. The user examines th...
research
02/10/2015

Cascading Bandits: Learning to Rank in the Cascade Model

A search engine usually outputs a list of K web pages. The user examines...
research
10/30/2015

CONQUER: Confusion Queried Online Bandit Learning

We present a new recommendation setting for picking out two items from a...
research
12/03/2018

Thompson Sampling for Noncompliant Bandits

Thompson sampling, a Bayesian method for balancing exploration and explo...
research
06/07/2020

Thompson Sampling for Multinomial Logit Contextual Bandits

We consider a dynamic assortment selection problem where the goal is to ...
research
11/19/2020

Fully Gap-Dependent Bounds for Multinomial Logit Bandit

We study the multinomial logit (MNL) bandit problem, where at each time ...

Please sign up or login with your details

Forgot password? Click here to reset