Latent Bandits Revisited

by   Joey Hong, et al.

A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state. The primary goal of the agent is to identify the latent state, after which it can act optimally. This setting is a natural midpoint between online and offline learning—complex models can be learned offline with the agent identifying latent state online—of practical relevance in, say, recommender systems. In this work, we propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling. Our methods are contextual and aware of model uncertainty and misspecification. We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions. A comprehensive empirical study showcases the advantages of our approach.


page 1

page 2

page 3

page 4


Information-Gathering in Latent Bandits

In the latent bandit problem, the learner has access to reward distribut...

Non-Stationary Latent Bandits

Users of recommender systems often behave in a non-stationary fashion, d...

Differentiable Meta-Learning in Contextual Bandits

We study a contextual bandit setting where the learning agent has access...

Generalizing Hierarchical Bayesian Bandits

A contextual bandit is a popular and practical framework for online lear...

Quantum contextual bandits and recommender systems for quantum data

We study a recommender system for quantum data using the linear contextu...

Uplifting Bandits

We introduce a multi-armed bandit model where the reward is a sum of mul...

The Impact of Batch Learning in Stochastic Bandits

We consider a special case of bandit problems, namely batched bandits. M...

Please sign up or login with your details

Forgot password? Click here to reset