Adversarial Dueling Bandits

10/27/2020
by   Aadirupa Saha, et al.
0

We introduce the problem of regret minimization in Adversarial Dueling Bandits. As in classic Dueling Bandits, the learner has to repeatedly choose a pair of items and observe only a relative binary `win-loss' feedback for this pair, but here this feedback is generated from an arbitrary preference matrix, possibly chosen adversarially. Our main result is an algorithm whose T-round regret compared to the Borda-winner from a set of K items is Õ(K^1/3T^2/3), as well as a matching Ω(K^1/3T^2/3) lower bound. We also prove a similar high probability regret bound. We further consider a simpler fixed-gap adversarial setup, which bridges between two extreme preference feedback models for dueling bandits: stationary preferences and an arbitrary sequence of preferences. For the fixed-gap adversarial setup we give an Õ((K/Δ^2)logT) regret algorithm, where Δ is the gap in Borda scores between the best item and all other items, and show a lower bound of Ω(K/Δ^2) indicating that our dependence on the main problem parameters K and Δ is tight (up to logarithmic factors).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2020

Regret Minimization in Stochastic Contextual Dueling Bandits

We consider the problem of stochastic K-armed dueling bandit in the cont...
research
02/02/2022

Non-Stationary Dueling Bandits

We study the non-stationary dueling bandits problem with K arms, where t...
research
04/21/2022

Human Preferences as Dueling Bandits

The dramatic improvements in core information retrieval tasks engendered...
research
11/28/2018

Adversarial Bandits with Knapsacks

We consider Bandits with Knapsacks (henceforth, BwK), a general model fo...
research
03/01/2019

Regret Minimisation in Multinomial Logit Bandits

We consider two regret minimisation problems over subsets of a finite gr...
research
10/04/2022

Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs

We study high-probability regret bounds for adversarial K-armed bandits ...
research
12/10/2018

Duelling Bandits with Weak Regret in Adversarial Environments

Research on the multi-armed bandit problem has studied the trade-off of ...

Please sign up or login with your details

Forgot password? Click here to reset