Top K Ranking for Multi-Armed Bandit with Noisy Evaluations

by   Evrard Garcelon, et al.

We consider a multi-armed bandit setting where, at the beginning of each round, the learner receives noisy independent, and possibly biased, evaluations of the true reward of each arm and it selects K arms with the objective of accumulating as much reward as possible over T rounds. Under the assumption that at each round the true reward of each arm is drawn from a fixed distribution, we derive different algorithmic approaches and theoretical guarantees depending on how the evaluations are generated. First, we show a O(T^2/3) regret in the general case when the observation functions are a genearalized linear function of the true rewards. On the other hand, we show that an improved O(√(T)) regret can be derived when the observation functions are noisy linear functions of the true rewards. Finally, we report an empirical validation that confirms our theoretical findings, provides a thorough comparison to alternative approaches, and further supports the interest of this setting in practice.


Multi-armed Bandit Problems with Strategic Arms

We study a strategic version of the multi-armed bandit problem, where ea...

Decentralized Learning Dynamics in the Gossip Model

We study a distributed multi-armed bandit setting among a population of ...

Multi-Armed Bandits with Censored Consumption of Resources

We consider a resource-aware variant of the classical multi-armed bandit...

Generalizing distribution of partial rewards for multi-armed bandits with temporally-partitioned rewards

We investigate the Multi-Armed Bandit problem with Temporally-Partitione...

Bandit Regret Scaling with the Effective Loss Range

We study how the regret guarantees of nonstochastic multi-armed bandits ...

Asynchronous Parallel Empirical Variance Guided Algorithms for the Thresholding Bandit Problem

This paper considers the multi-armed thresholding bandit problem -- iden...

Approximation Methods for Kernelized Bandits

The RKHS bandit problem (also called kernelized multi-armed bandit probl...

Please sign up or login with your details

Forgot password? Click here to reset