Enhanced Doubly Robust Learning for Debiasing Post-click Conversion Rate Estimation

by   Siyuan Guo, et al.

Post-click conversion, as a strong signal indicating the user preference, is salutary for building recommender systems. However, accurately estimating the post-click conversion rate (CVR) is challenging due to the selection bias, i.e., the observed clicked events usually happen on users' preferred items. Currently, most existing methods utilize counterfactual learning to debias recommender systems. Among them, the doubly robust (DR) estimator has achieved competitive performance by combining the error imputation based (EIB) estimator and the inverse propensity score (IPS) estimator in a doubly robust way. However, inaccurate error imputation may result in its higher variance than the IPS estimator. Worse still, existing methods typically use simple model-agnostic methods to estimate the imputation error, which are not sufficient to approximate the dynamically changing model-correlated target (i.e., the gradient direction of the prediction model). To solve these problems, we first derive the bias and variance of the DR estimator. Based on it, a more robust doubly robust (MRDR) estimator has been proposed to further reduce its variance while retaining its double robustness. Moreover, we propose a novel double learning approach for the MRDR estimator, which can convert the error imputation into the general CVR estimation. Besides, we empirically verify that the proposed learning scheme can further eliminate the high variance problem of the imputation learning. To evaluate its effectiveness, extensive experiments are conducted on a semi-synthetic dataset and two real-world datasets. The results demonstrate the superiority of the proposed approach over the state-of-the-art methods. The code is available at https://github.com/guosyjlu/MRDR-DL.


page 1

page 2

page 3

page 4


CDR: Conservative Doubly Robust Learning for Debiased Recommendation

In recommendation systems (RS), user behavior data is observational rath...

Off-Policy Evaluation Using Information Borrowing and Context-Based Switching

We consider the off-policy evaluation (OPE) problem in contextual bandit...

Statistical Inference after Kernel Ridge Regression Imputation under item nonresponse

Imputation is a popular technique for handling missing data. We consider...

Doubly-Robust Estimation for Unbiased Learning-to-Rank from Position-Biased Click Feedback

Clicks on rankings suffer from position bias: generally items on lower r...

Debiasing Learning for Membership Inference Attacks Against Recommender Systems

Learned recommender systems may inadvertently leak information about the...

Capturing Conversion Rate Fluctuation during Sales Promotions: A Novel Historical Data Reuse Approach

Conversion rate (CVR) prediction is one of the core components in online...

A Causal Perspective to Unbiased Conversion Rate Estimation on Data Missing Not at Random

In modern e-commerce and advertising recommender systems, ongoing resear...

Please sign up or login with your details

Forgot password? Click here to reset