ResAct: Reinforcing Long-term Engagement in Sequential Recommendation with Residual Actor

by   Wanqi Xue, et al.

Long-term engagement is preferred over immediate engagement in sequential recommendation as it directly affects product operational metrics such as daily active users (DAUs) and dwell time. Meanwhile, reinforcement learning (RL) is widely regarded as a promising framework for optimizing long-term engagement in sequential recommendation. However, due to expensive online interactions, it is very difficult for RL algorithms to perform state-action value estimation, exploration and feature extraction when optimizing long-term engagement. In this paper, we propose ResAct which seeks a policy that is close to, but better than, the online-serving policy. In this way, we can collect sufficient data near the learned policy so that state-action values can be properly estimated, and there is no need to perform online exploration. Directly optimizing this policy is difficult due to the huge policy space. ResAct instead solves it by first reconstructing the online behaviors and then improving it. Our main contributions are fourfold. First, we design a generative model which reconstructs behaviors of the online-serving policy by sampling multiple action estimators. Second, we design an effective learning paradigm to train the residual actor which can output the residual for action improvement. Third, we facilitate the extraction of features with two information theoretical regularizers to confirm the expressiveness and conciseness of features. Fourth, we conduct extensive experiments on a real world dataset consisting of millions of sessions, and our method significantly outperforms the state-of-the-art baselines in various of long term engagement optimization tasks.


page 1

page 2

page 3

page 4


PrefRec: Preference-based Recommender Systems for Reinforcing Long-term User Engagement

Current advances in recommender systems have been remarkably successful ...

Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems

Recommender systems play a crucial role in our daily lives. Feed streami...

Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning

Auction-based recommender systems are prevalent in online advertising pl...

Learning on the Job: Long-Term Behavioural Adaptation in Human-Robot Interactions

In this work, we propose a framework for allowing autonomous robots depl...

Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Offline RL

We study session-based recommendation scenarios where we want to recomme...

Learning to Take a Break: Sustainable Optimization of Long-Term User Engagement

Optimizing user engagement is a key goal for modern recommendation syste...

Sequential Search with Off-Policy Reinforcement Learning

Recent years have seen a significant amount of interests in Sequential R...

Please sign up or login with your details

Forgot password? Click here to reset