Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling

12/06/2019
by   Cindy Trinh, et al.
0

Stochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework for regret minimization problems over rank-one matrices of arms. The initially proposed algorithms are proved to have logarithmic regret, but do not match the existing lower bound for this problem. We close this gap by first proving that rank-one bandits are a particular instance of unimodal bandits, and then providing a new analysis of Unimodal Thompson Sampling (UTS), initially proposed by Paladino et al (2017). We prove an asymptotically optimal regret bound on the frequentist regret of UTS and we support our claims with simulations showing the significant improvement of our method compared to the state-of-the-art.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2022

Better Best of Both Worlds Bounds for Bandits with Switching Costs

We study best-of-both-worlds algorithms for bandits with switching cost,...
research
07/04/2018

Factored Bandits

We introduce the factored bandits model, which is a framework for learni...
research
10/23/2020

Instance-Wise Minimax-Optimal Algorithms for Logistic Bandits

Logistic Bandits have recently attracted substantial attention, by provi...
research
03/29/2022

Nearly Minimax Algorithms for Linear Bandits with Shared Representation

We give novel algorithms for multi-task and lifelong linear bandits with...
research
10/12/2019

Thompson Sampling in Non-Episodic Restless Bandits

Restless bandit problems assume time-varying reward distributions of the...
research
06/27/2012

Exponential Regret Bounds for Gaussian Process Bandits with Deterministic Observations

This paper analyzes the problem of Gaussian process (GP) bandits with de...
research
06/01/2023

Last Switch Dependent Bandits with Monotone Payoff Functions

In a recent work, Laforgue et al. introduce the model of last switch dep...

Please sign up or login with your details

Forgot password? Click here to reset