Lexicographic Multiarmed Bandit

07/26/2019
by   Alihan Hüyük, et al.
0

We consider a multiobjective multiarmed bandit problem with lexicographically ordered objectives. In this problem, the goal of the learner is to select arms that are lexicographic optimal as much as possible without knowing the arm reward distributions beforehand. We capture this goal by defining a multidimensional form of regret that measures the loss of the learner due to not selecting lexicographic optimal arms, and then, consider two settings where the learner has prior information on the expected arm rewards. In the first setting, the learner only knows for each objective the lexicographic optimal expected reward. In the second setting, it only knows for each objective near-lexicographic optimal expected rewards. For both settings we prove that the learner achieves expected regret uniformly bounded in time. In addition, we also consider the harder prior-free case, and show that the learner can still achieve sublinear in time gap-free regret. Finally, we experimentally evaluate performance of the proposed algorithms in a variety of multiobjective learning problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2015

On Regret-Optimal Learning in Decentralized Multi-player Multi-armed Bandits

We consider the problem of learning in single-player and multiplayer mul...
research
03/03/2021

Combinatorial Bandits without Total Order for Arms

We consider the combinatorial bandits problem, where at each time step, ...
research
03/11/2018

Multi-objective Contextual Bandit Problem with Similarity Information

In this paper we propose the multi-objective contextual bandit problem w...
research
05/28/2019

Repeated A/B Testing

We study a setting in which a learner faces a sequence of A/B tests and ...
research
11/21/2019

Safe Linear Stochastic Bandits

We introduce the safe linear stochastic bandit framework—a generalizatio...
research
02/09/2016

Compliance-Aware Bandits

Motivated by clinical trials, we study bandits with observable non-compl...
research
07/13/2019

Preselection Bandits under the Plackett-Luce Model

In this paper, we introduce the Preselection Bandit problem, in which th...

Please sign up or login with your details

Forgot password? Click here to reset