Merge Double Thompson Sampling for Large Scale Online Ranker Evaluation

by   Chang Li, et al.

Online ranker evaluation is one of the key challenges in information retrieval. While the preferences of rankers can be inferred by interleaved comparison methods, how to effectively choose the pair of rankers to generate the result list without degrading the user experience too much can be formalized as a K-armed dueling bandit problem, which is an online partial-information learning framework, where feedback comes in the form of pair-wise preferences. A commercial search system may evaluate a large number of rankers concurrently, and scaling effectively in the presence of numerous rankers has not been fully studied. In this paper, we focus on solving the large-scale online ranker evaluation problem under the so-called Condorcet assumption, where there exists an optimal ranker that is preferred to all other rankers. We propose Merge Double Thompson Sampling (MergeDTS), which first utilizes a divide-and-conquer strategy that localizes the comparisons carried out by the algorithm to small batches of rankers, and then employs the Thompson Sampling (TS) to reduce the comparisons between suboptimal rankers inside these small batches. The effectiveness (regret) and efficiency (time complexity) of MergeDTS are extensively evaluated using examples from the domain of online evaluation for web search. Our main finding is that for large-scale Condorcet ranker evaluation problems MergeDTS outperforms the state-of-the-art dueling bandit algorithms.


page 1

page 2

page 3

page 4


KLUCB Approach to Copeland Bandits

Multi-armed bandit(MAB) problem is a reinforcement learning framework wh...

Multi-Dueling Bandits and Their Application to Online Ranker Evaluation

New ranking algorithms are continually being developed and refined, nece...

Online Information Retrieval Evaluation using the STELLA Framework

Involving users in early phases of software development has become a com...

Double-Linear Thompson Sampling for Context-Attentive Bandits

In this paper, we analyze and extend an online learning framework known ...

Sensitive and Scalable Online Evaluation with Theoretical Guarantees

Multileaved comparison methods generalize interleaved comparison methods...

A Method with Feedback for Aggregation of Group Incomplete Pair-Wise Comparisons

A method for aggregation of expert estimates in small groups is proposed...

IMDB-WIKI-SbS: An Evaluation Dataset for Crowdsourced Pairwise Comparisons

Today, comprehensive evaluation of large-scale machine learning models i...

Please sign up or login with your details

Forgot password? Click here to reset