Online Bayesian Recommendation with No Regret

by   Yiding Feng, et al.

We introduce and study the online Bayesian recommendation problem for a platform, who can observe a utility-relevant state of a product, repeatedly interacting with a population of myopic users through an online recommendation mechanism. This paradigm is common in a wide range of scenarios in the current Internet economy. For each user with her own private preference and belief, the platform commits to a recommendation strategy to utilize his information advantage on the product state to persuade the self-interested user to follow the recommendation. The platform does not know user's preferences and beliefs, and has to use an adaptive recommendation strategy to persuade with gradually learning user's preferences and beliefs in the process. We aim to design online learning policies with no Stackelberg regret for the platform, i.e., against the optimum policy in hindsight under the assumption that users will correspondingly adapt their behaviors to the benchmark policy. Our first result is an online policy that achieves double logarithm regret dependence on the number of rounds. We then present a hardness result showing that no adaptive online policy can achieve regret with better dependency on the number of rounds. Finally, by formulating the platform's problem as optimizing a linear program with membership oracle access, we present our second online policy that achieves regret with polynomial dependence on the number of states but logarithm dependence on the number of rounds.


Online learning in MDPs with side information

We study online learning of finite Markov decision process (MDP) problem...

Bandits with Deterministically Evolving States

We propose a model for learning with bandit feedback while accounting fo...

Reputation-based Persuasion Platforms

In this paper, we introduce a two-stage Bayesian persuasion model in whi...

Learning with Abandonment

Consider a platform that wants to learn a personalized policy for each u...

Beyond Adaptive Submodularity: Adaptive Influence Maximization with Intermediary Constraints

We consider a brand with a given budget that wants to promote a product ...

Online Learning in a Creator Economy

The creator economy has revolutionized the way individuals can profit th...

Active Preference Elicitation via Adjustable Robust Optimization

We consider the problem faced by a recommender system which seeks to off...

Please sign up or login with your details

Forgot password? Click here to reset