Statistical Inference: The Missing Piece of RecSys Experiment Reliability Discourse
This paper calls attention to the missing component of the recommender system evaluation process: Statistical Inference. There is active research in several components of the recommender system evaluation process: selecting baselines, standardizing benchmarks, and target item sampling. However, there has not yet been significant work on the role and use of statistical inference for analyzing recommender system evaluation results. In this paper, we argue that the use of statistical inference is a key component of the evaluation process that has not been given sufficient attention. We support this argument with systematic review of recent RecSys papers to understand how statistical inference is currently being used, along with a brief survey of studies that have been done on the use of statistical inference in the information retrieval community. We present several challenges that exist for inference in recommendation experiment which buttresses the need for empirical studies to aid with appropriately selecting and applying statistical inference techniques.
READ FULL TEXT