Best-item Learning in Random Utility Models with Subset Choices
We consider the problem of PAC learning the most valuable item from a pool of n items using sequential, adaptively chosen plays of subsets of k items, when, upon playing a subset, the learner receives relative feedback sampled according to a general Random Utility Model (RUM) with independent noise perturbations to the latent item utilities. We identify a new property of such a RUM, termed the minimum advantage, that helps in characterizing the complexity of separating pairs of items based on their relative win/loss empirical counts, and can be bounded as a function of the noise distribution alone. We give a learning algorithm for general RUMs, based on pairwise relative counts of items and hierarchical elimination, along with a new PAC sample complexity guarantee of O(n/c^2ϵ^2logk/δ) rounds to identify an ϵ-optimal item with confidence 1-δ, when the worst case pairwise advantage in the RUM has sensitivity at least c to the parameter gaps of items. Fundamental lower bounds on PAC sample complexity show that this is near-optimal in terms of its dependence on n,k and c.
READ FULL TEXT