Non-Stationary Latent Bandits

by   Joey Hong, et al.

Users of recommender systems often behave in a non-stationary fashion, due to their evolving preferences and tastes over time. In this work, we propose a practical approach for fast personalization to non-stationary users. The key idea is to frame this problem as a latent bandit, where the prototypical models of user behavior are learned offline and the latent state of the user is inferred online from its interactions with the models. We call this problem a non-stationary latent bandit. We propose Thompson sampling algorithms for regret minimization in non-stationary latent bandits, analyze them, and evaluate them on a real-world dataset. The main strength of our approach is that it can be combined with rich offline-learned models, which can be misspecified, and are subsequently fine-tuned online using posterior sampling. In this way, we naturally combine the strengths of offline and online learning.


page 1

page 2

page 3

page 4


Cascading Non-Stationary Bandits: Online Learning to Rank in the Non-Stationary Cascade Model

Non-stationarity appears in many online applications such as web search ...

Learning User Preferences in Non-Stationary Environments

Recommendation systems often use online collaborative filtering (CF) alg...

ReLoop2: Building Self-Adaptive Recommendation Models via Responsive Error Compensation Loop

Industrial recommender systems face the challenge of operating in non-st...

Latent Bandits Revisited

A latent bandit problem is one in which the learning agent knows the arm...

Online Learning for Non-Stationary A/B Tests

The rollout of new versions of a feature in modern applications is a man...

Optimizing Ranking Systems Online as Bandits

Ranking system is the core part of modern retrieval and recommender syst...

Non-Stationary Bandits with Intermediate Observations

Online recommender systems often face long delays in receiving feedback,...

Please sign up or login with your details

Forgot password? Click here to reset