PrefRec: Preference-based Recommender Systems for Reinforcing Long-term User Engagement

by   Wanqi Xue, et al.

Current advances in recommender systems have been remarkably successful in optimizing immediate engagement. However, long-term user engagement, a more desirable performance metric, remains difficult to improve. Meanwhile, recent reinforcement learning (RL) algorithms have shown their effectiveness in a variety of long-term goal optimization tasks. For this reason, RL is widely considered as a promising framework for optimizing long-term user engagement in recommendation. Despite being a promising approach, the application of RL heavily relies on well-designed rewards, but designing rewards related to long-term user engagement is quite difficult. To mitigate the problem, we propose a novel paradigm, Preference-based Recommender systems (PrefRec), which allows RL recommender systems to learn from preferences about users' historical behaviors rather than explicitly defined rewards. Such preferences are easily accessible through techniques such as crowdsourcing, as they do not require any expert knowledge. With PrefRec, we can fully exploit the advantages of RL in optimizing long-term goals, while avoiding complex reward engineering. PrefRec uses the preferences to automatically train a reward function in an end-to-end manner. The reward function is then used to generate learning signals to train the recommendation policy. Furthermore, we design an effective optimization method for PrefRec, which uses an additional value function, expectile regression and reward model pre-training to improve the performance. Extensive experiments are conducted on a variety of long-term user engagement optimization tasks. The results show that PrefRec significantly outperforms previous state-of-the-art methods in all the tasks.


page 1

page 2

page 3

page 4


Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems

Recommender systems play a crucial role in our daily lives. Feed streami...

ResAct: Reinforcing Long-term Engagement in Sequential Recommendation with Residual Actor

Long-term engagement is preferred over immediate engagement in sequentia...

From Clicks to Conversions: Recommendation for long-term reward

Recommender systems are often optimised for short-term reward: a recomme...

Learning to Take a Break: Sustainable Optimization of Long-Term User Engagement

Optimizing user engagement is a key goal for modern recommendation syste...

Local Policy Improvement for Recommender Systems

Recommender systems aim to answer the following question: given the item...

Towards Validating Long-Term User Feedbacks in Interactive Recommendation Systems

Interactive Recommender Systems (IRSs) have attracted a lot of attention...

Should I send this notification? Optimizing push notifications decision making by modeling the future

Most recommender systems are myopic, that is they optimize based on the ...

Please sign up or login with your details

Forgot password? Click here to reset