Accelerated Convergence for Counterfactual Learning to Rank

05/21/2020
by   Rolf Jagerman, et al.
7

Counterfactual Learning to Rank (LTR) algorithms learn a ranking model from logged user interactions, often collected using a production system. Employing such an offline learning approach has many benefits compared to an online one, but it is challenging as user feedback often contains high levels of bias. Unbiased LTR uses Inverse Propensity Scoring (IPS) to enable unbiased learning from logged user interactions. One of the major difficulties in applying Stochastic Gradient Descent (SGD) approaches to counterfactual learning problems is the large variance introduced by the propensity weights. In this paper we show that the convergence rate of SGD approaches with IPS-weighted gradients suffers from the large variance introduced by the IPS weights: convergence is slow, especially when there are large IPS weights. To overcome this limitation, we propose a novel learning algorithm, called CounterSample, that has provably better convergence than standard IPS-weighted gradient descent methods. We prove that CounterSample converges faster and complement our theoretical findings with empirical results by performing extensive experimentation in a number of biased LTR scenarios – across optimizers, batch sizes, and different degrees of position bias.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2017

A Novel Stochastic Stratified Average Gradient Method: Convergence Rate and Its Complexity

SGD (Stochastic Gradient Descent) is a popular algorithm for large scale...
research
07/16/2019

Unbiased Learning to Rank: Counterfactual and Online Approaches

This tutorial covers and contrasts the two main methodologies in unbiase...
research
07/24/2020

Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking

Counterfactual evaluation can estimate Click-Through-Rate (CTR) differen...
research
07/15/2019

To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions

Learning to Rank (LTR) from user interactions is challenging as user fee...
research
04/26/2023

Safe Deployment for Counterfactual Learning to Rank with Exposure-Based Risk Minimization

Counterfactual learning to rank (CLTR) relies on exposure-based inverse ...
research
09/22/2018

Differentiable Unbiased Online Learning to Rank

Online Learning to Rank (OLTR) methods optimize rankers based on user in...
research
01/29/2019

Optimizing Ranking Models in an Online Setting

Online Learning to Rank (OLTR) methods optimize ranking models by direct...

Please sign up or login with your details

Forgot password? Click here to reset