Dueling Bandits with Qualitative Feedback

09/14/2018
by   Liyuan Xu, et al.
0

We formulate and study a novel multi-armed bandit problem called the qualitative dueling bandit (QDB) problem, where an agent observes not numeric but qualitative feedback by pulling each arm. We employ the same regret as the dueling bandit (DB) problem where the duel is carried out by comparing the qualitative feedback. Although we can naively use classic DB algorithms for solving the QDB problem, this reduction significantly worsens the performance---actually, in the QDB problem, the probability that one arm wins the duel over another arm can be directly estimated without carrying out actual duels. In this paper, we propose such direct algorithms for the QDB problem. Our theoretical analysis shows that the proposed algorithms significantly outperform DB algorithms by incorporating the qualitative feedback, and experimental results also demonstrate vast improvement over the existing DB algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2020

Fair Algorithms for Multi-Agent Multi-Armed Bandits

We propose a multi-agent variant of the classical multi-armed bandit pro...
research
01/22/2023

Doubly Adversarial Federated Bandits

We study a new non-stochastic federated multi-armed bandit problem with ...
research
02/07/2019

KLUCB Approach to Copeland Bandits

Multi-armed bandit(MAB) problem is a reinforcement learning framework wh...
research
10/16/2021

Statistical Consequences of Dueling Bandits

Multi-Armed-Bandit frameworks have often been used by researchers to ass...
research
05/21/2017

Instrument-Armed Bandits

We extend the classic multi-armed bandit (MAB) model to the setting of n...
research
08/05/2017

Thompson Sampling Guided Stochastic Searching on the Line for Deceptive Environments with Applications to Root-Finding Problems

The multi-armed bandit problem forms the foundation for solving a wide r...
research
02/24/2021

Continuous Mean-Covariance Bandits

Existing risk-aware multi-armed bandit models typically focus on risk me...

Please sign up or login with your details

Forgot password? Click here to reset