BLOB : A Probabilistic Model for Recommendation that Combines Organic and Bandit Signals

08/28/2020
by   Otmane Sakhi, et al.
7

A common task for recommender systems is to build a pro le of the interests of a user from items in their browsing history and later to recommend items to the user from the same catalog. The users' behavior consists of two parts: the sequence of items that they viewed without intervention (the organic part) and the sequences of items recommended to them and their outcome (the bandit part). In this paper, we propose Bayesian Latent Organic Bandit model (BLOB), a probabilistic approach to combine the 'or-ganic' and 'bandit' signals in order to improve the estimation of recommendation quality. The bandit signal is valuable as it gives direct feedback of recommendation performance, but the signal quality is very uneven, as it is highly concentrated on the recommendations deemed optimal by the past version of the recom-mender system. In contrast, the organic signal is typically strong and covers most items, but is not always relevant to the recommendation task. In order to leverage the organic signal to e ciently learn the bandit signal in a Bayesian model we identify three fundamental types of distances, namely action-history, action-action and history-history distances. We implement a scalable approximation of the full model using variational auto-encoders and the local re-paramerization trick. We show using extensive simulation studies that our method out-performs or matches the value of both state-of-the-art organic-based recommendation algorithms, and of bandit-based methods (both value and policy-based) both in organic and bandit-rich environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2022

A Scalable Probabilistic Model for Reward Optimizing Slate Recommendation

We introduce Probabilistic Rank and Reward model (PRR), a scalable proba...
research
08/12/2020

Sequential recommendation with metric models based on frequent sequences

Modeling user preferences (long-term history) and user dynamics (short-t...
research
01/23/2019

Thompson Sampling for a Fatigue-aware Online Recommendation System

In this paper we consider an online recommendation setting, where a plat...
research
07/26/2021

Combining Reward and Rank Signals for Slate Recommendation

We consider the problem of slate recommendation, where the recommender s...
research
07/01/2019

Bandit Learning for Diversified Interactive Recommendation

Interactive recommender systems that enable the interactions between use...
research
04/24/2019

Latent Variable Session-Based Recommendation

Session based recommendation provides an attractive alternative to the t...
research
04/24/2019

Three Methods for Training on Bandit Feedback

There are three quite distinct ways to train a machine learning model on...

Please sign up or login with your details

Forgot password? Click here to reset