Regret Minimisation in Multinomial Logit Bandits

03/01/2019
by   Aadirupa Saha, et al.
0

We consider two regret minimisation problems over subsets of a finite ground set [n], with subset-wise relative preference information feedback according to the Multinomial logit choice model. The first setting requires the learner to test subsets of size bounded by a maximum size followed by receiving top-m rank-ordered feedback, while in the second setting the learner is restricted to play subsets of a fixed size k with a full ranking observed as feedback. For both settings, we devise new, order-optimal regret algorithms, and derive fundamental limits on the regret performance of online learning with subset-wise preferences. Our results also show the value of eliciting a general top m-rank-ordered feedback over single winner feedback (m=1).

READ FULL TEXT
research
10/22/2018

Adversarial Online Learning with noise

We present and study models of adversarial online learning where the fee...
research
07/13/2019

Preselection Bandits under the Plackett-Luce Model

In this paper, we introduce the Preselection Bandit problem, in which th...
research
10/27/2020

Adversarial Dueling Bandits

We introduce the problem of regret minimization in Adversarial Dueling B...
research
06/16/2023

Understanding the Role of Feedback in Online Learning with Switching Costs

In this paper, we study the role of feedback in online learning with swi...
research
02/28/2018

RRR: Rank-Regret Representative

We propose the rank-regret representative as a way of choosing a small s...
research
08/30/2013

Online Ranking: Discrete Choice, Spearman Correlation and Other Feedback

Given a set V of n objects, an online ranking system outputs at each tim...
research
11/21/2017

Constructive Preference Elicitation over Hybrid Combinatorial Spaces

Peference elicitation is the task of suggesting a highly preferred confi...

Please sign up or login with your details

Forgot password? Click here to reset