Doubly Robust Estimator for Off-Policy Evaluation with Large Action Spaces

08/07/2023
by   Tatsuhiro Shimizu, et al.
0

We study Off-Policy Evaluation (OPE) in contextual bandit settings with large action spaces. The benchmark estimators suffer from severe bias and variance tradeoffs. Parametric approaches suffer from bias due to difficulty specifying the correct model, whereas ones with importance weight suffer from variance. To overcome these limitations, Marginalized Inverse Propensity Scoring (MIPS) was proposed to mitigate the estimator's variance via embeddings of an action. To make the estimator more accurate, we propose the doubly robust estimator of MIPS called the Marginalized Doubly Robust (MDR) estimator. Theoretical analysis shows that the proposed estimator is unbiased under weaker assumptions than MIPS while maintaining variance reduction against IPS, which was the main advantage of MIPS. The empirical experiment verifies the supremacy of MDR against existing estimators.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/13/2022

Off-Policy Evaluation for Large Action Spaces via Embeddings

Off-policy evaluation (OPE) in contextual bandits has seen rapid adoptio...
research
05/14/2023

Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling

We study off-policy evaluation (OPE) of contextual bandit policies for l...
research
09/03/2023

Double Clipping: Less-Biased Variance Reduction in Off-Policy Evaluation

"Clipping" (a.k.a. importance weight truncation) is a widely used varian...
research
07/13/2023

Leveraging Factored Action Spaces for Off-Policy Evaluation

Off-policy evaluation (OPE) aims to estimate the benefit of following a ...
research
07/22/2019

Doubly robust off-policy evaluation with shrinkage

We design a new family of estimators for off-policy evaluation in contex...
research
10/15/2022

Off-policy evaluation for learning-to-rank via interpolating the item-position model and the position-based model

A critical need for industrial recommender systems is the ability to eva...
research
06/06/2022

Markovian Interference in Experiments

We consider experiments in dynamical systems where interventions on some...

Please sign up or login with your details

Forgot password? Click here to reset