Bandits with Side Observations: Bounded vs. Logarithmic Regret

07/10/2018
by   Rémy Degenne, et al.
0

We consider the classical stochastic multi-armed bandit but where, from time to time and roughly with frequency ϵ, an extra observation is gathered by the agent for free. We prove that, no matter how small ϵ is the agent can ensure a regret uniformly bounded in time. More precisely, we construct an algorithm with a regret smaller than ∑_i (1/ϵ)/Δ_i, up to multiplicative constant and loglog terms. We also prove a matching lower-bound, stating that no reasonable algorithm can outperform this quantity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2013

Bounded regret in stochastic multi-armed bandits

We study the stochastic multi-armed bandit problem when one knows the va...
research
02/14/2020

Coordination without communication: optimal regret in two players multi-armed bandits

We consider two agents playing simultaneously the same stochastic three-...
research
08/10/2020

Lenient Regret for Multi-Armed Bandits

We consider the Multi-Armed Bandit (MAB) problem, where the agent sequen...
research
02/17/2017

Beyond the Hazard Rate: More Perturbation Algorithms for Adversarial Multi-armed Bandits

Recent work on follow the perturbed leader (FTPL) algorithms for the adv...
research
07/06/2020

Multi-Armed Bandits with Local Differential Privacy

This paper investigates the problem of regret minimization for multi-arm...
research
10/20/2018

Quantifying the Burden of Exploration and the Unfairness of Free Riding

We consider the multi-armed bandit setting with a twist. Rather than hav...
research
10/15/2018

Regret vs. Bandwidth Trade-off for Recommendation Systems

We consider recommendation systems that need to operate under wireless b...

Please sign up or login with your details

Forgot password? Click here to reset