Ranking in Contextual Multi-Armed Bandits

06/30/2022
by   Amitis Shidani, et al.
0

We study a ranking problem in the contextual multi-armed bandit setting. A learning agent selects an ordered list of items at each time step and observes stochastic outcomes for each position. In online recommendation systems, showing an ordered list of the most attractive items would not be the best choice since both position and item dependencies result in a complicated reward function. A very naive example is the lack of diversity when all the most attractive items are from the same category. We model position and item dependencies in the ordered list and design UCB and Thompson Sampling type algorithms for this problem. We prove that the regret bound over T rounds and L positions is O(L√(d T)), which has the same order as the previous works with respect to T and only increases linearly with L. Our work generalizes existing studies in several directions, including position dependencies where position discount is a particular case, and proposes a more general contextual bandit model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/01/2019

A Contextual-Bandit Approach to Online Learning to Rank for Relevance and Diversity

Online learning to rank (LTR) focuses on learning a policy from user int...
research
09/15/2012

Thompson Sampling for Contextual Bandits with Linear Payoffs

Thompson Sampling is one of the oldest heuristics for multi-armed bandit...
research
01/28/2022

Top-K Ranking Deep Contextual Bandits for Information Selection Systems

In today's technology environment, information is abundant, dynamic, and...
research
09/07/2020

Learning to Rank under Multinomial Logit Choice

Learning the optimal ordering of content is an important challenge in we...
research
04/27/2020

Learning to Rank in the Position Based Model with Bandit Feedback

Personalization is a crucial aspect of many online experiences. In parti...
research
05/14/2023

Multi-View Interactive Collaborative Filtering

In many scenarios, recommender system user interaction data such as clic...
research
05/04/2020

Categorized Bandits

We introduce a new stochastic multi-armed bandit setting where arms are ...

Please sign up or login with your details

Forgot password? Click here to reset