Multi-Dueling Bandits and Their Application to Online Ranker Evaluation

by   Brian Brost, et al.

New ranking algorithms are continually being developed and refined, necessitating the development of efficient methods for evaluating these rankers. Online ranker evaluation focuses on the challenge of efficiently determining, from implicit user feedback, which ranker out of a finite set of rankers is the best. Online ranker evaluation can be modeled by dueling ban- dits, a mathematical model for online learning under limited feedback from pairwise comparisons. Comparisons of pairs of rankers is performed by interleaving their result sets and examining which documents users click on. The dueling bandits model addresses the key issue of which pair of rankers to compare at each iteration, thereby providing a solution to the exploration-exploitation trade-off. Recently, methods for simultaneously comparing more than two rankers have been developed. However, the question of which rankers to compare at each iteration was left open. We address this question by proposing a generalization of the dueling bandits model that uses simultaneous comparisons of an unrestricted number of rankers. We evaluate our algorithm on synthetic data and several standard large-scale online ranker evaluation datasets. Our experimental results show that the algorithm yields orders of magnitude improvement in performance compared to stateof- the-art dueling bandit algorithms.


page 1

page 2

page 3

page 4


Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback

Recent works on Multi-Armed Bandits (MAB) and Combinatorial Multi-Armed ...

Banker Online Mirror Descent

We propose Banker-OMD, a novel framework generalizing the classical Onli...

Merge Double Thompson Sampling for Large Scale Online Ranker Evaluation

Online ranker evaluation is one of the key challenges in information ret...

Learning Neural Ranking Models Online from Implicit User Feedback

Existing online learning to rank (OL2R) solutions are limited to linear ...

Efficient Exploration of Gradient Space for Online Learning to Rank

Online learning to rank (OL2R) optimizes the utility of returned search ...

PairRank: Online Pairwise Learning to Rank by Divide-and-Conquer

Online Learning to Rank (OL2R) eliminates the need of explicit relevance...

Active Evaluation: Efficient NLG Evaluation with Few Pairwise Comparisons

Recent studies have shown the advantages of evaluating NLG systems using...

Please sign up or login with your details

Forgot password? Click here to reset