Offline A/B testing for Recommender Systems

01/22/2018
by   Alexandre Gilotte, et al.
0

Before A/B testing online a new version of a recommender system, it is usual to perform some offline evaluations on historical data. We focus on evaluation methods that compute an estimator of the potential uplift in revenue that could generate this new technology. It helps to iterate faster and to avoid losing money by detecting poor policies. These estimators are known as counterfactual or off-policy estimators. We show that traditional counterfactual estimators such as capped importance sampling and normalised importance sampling are experimentally not having satisfying bias-variance compromises in the context of personalised product recommendation for online advertising. We propose two variants of counterfactual estimates with different modelling of the bias that prove to be accurate in real-world conditions. We provide a benchmark of these estimators by showing their correlation with business metrics observed by running online A/B tests on a commercial recommender system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/03/2017

A comparative study of counterfactual estimators

We provide a comparative study of several widely used off-policy estimat...
research
11/06/2018

CAB: Continuous Adaptive Blending Estimator for Policy Evaluation and Learning

The ability to perform offline A/B-testing and off-policy learning using...
research
08/08/2022

Fast Offline Policy Optimization for Large Scale Recommendation

Personalised interactive systems such as recommender systems require sel...
research
04/22/2020

Optimization Approaches for Counterfactual Risk Minimization with Continuous Actions

Counterfactual reasoning from logged data has become increasingly import...
research
09/17/2021

Data-Driven Off-Policy Estimator Selection: An Application in User Marketing on An Online Content Delivery Service

Off-policy evaluation (OPE) is the method that attempts to estimate the ...
research
07/23/2019

Off-policy Learning for Multiple Loggers

It is well known that the historical logs are used for evaluating and le...
research
04/15/2022

Transfer Importance Sampling x2013 How Testing Automated Vehicles in Multiple Test Setups Helps With the Bias-Variance Tradeoff

The promise of increased road safety is a key motivator for the developm...

Please sign up or login with your details

Forgot password? Click here to reset