Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning

05/23/2023
by   Ruiyang Xu, et al.
0

Auction-based recommender systems are prevalent in online advertising platforms, but they are typically optimized to allocate recommendation slots based on immediate expected return metrics, neglecting the downstream effects of recommendations on user behavior. In this study, we employ reinforcement learning to optimize for long-term return metrics in an auction-based recommender system. Utilizing temporal difference learning, a fundamental reinforcement learning algorithm, we implement an one-step policy improvement approach that biases the system towards recommendations with higher long-term user engagement metrics. This optimizes value over long horizons while maintaining compatibility with the auction framework. Our approach is grounded in dynamic programming ideas which show that our method provably improves upon the existing auction-based base policy. Through an online A/B test conducted on an auction-based recommender system which handles billions of impressions and users daily, we empirically establish that our proposed method outperforms the current production system in terms of long-term user engagement metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2019

Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology

Most practical recommender systems focus on estimating immediate user en...
research
02/07/2023

Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective

We study the problem of optimizing a recommender system for outcomes tha...
research
12/20/2020

Reinforcement Learning-based Product Delivery Frequency Control

Frequency control is an important problem in modern recommender systems....
research
06/01/2022

ResAct: Reinforcing Long-term Engagement in Sequential Recommendation with Residual Actor

Long-term engagement is preferred over immediate engagement in sequentia...
research
06/13/2019

Early Detection of Long Term Evaluation Criteria in Online Controlled Experiments

A common dilemma encountered by many upon implementing an optimization m...
research
09/20/2018

Predicting Periodicity with Temporal Difference Learning

Temporal difference (TD) learning is an important approach in reinforcem...
research
02/17/2022

Should I send this notification? Optimizing push notifications decision making by modeling the future

Most recommender systems are myopic, that is they optimize based on the ...

Please sign up or login with your details

Forgot password? Click here to reset