Top-K Ranking Deep Contextual Bandits for Information Selection Systems

01/28/2022
by   Jade Freeman, et al.
0

In today's technology environment, information is abundant, dynamic, and heterogeneous in nature. Automated filtering and prioritization of information is based on the distinction between whether the information adds substantial value toward one's goal or not. Contextual multi-armed bandit has been widely used for learning to filter contents and prioritize according to user interest or relevance. Learn-to-Rank technique optimizes the relevance ranking on items, allowing the contents to be selected accordingly. We propose a novel approach to top-K rankings under the contextual multi-armed bandit framework. We model the stochastic reward function with a neural network to allow non-linear approximation to learn the relationship between rewards and contexts. We demonstrate the approach and evaluate the the performance of learning from the experiments using real world data sets in simulated scenarios. Empirical results show that this approach performs well under the complexity of a reward structure and high dimensional contextual features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2021

Deep Upper Confidence Bound Algorithm for Contextual Bandit Ranking of Information Selection

Contextual multi-armed bandits (CMAB) have been widely used for learning...
research
07/13/2020

Contextual Bandit with Missing Rewards

We consider a novel variant of the contextual bandit problem (i.e., the ...
research
11/22/2017

Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models

Dialog response selection is an important step towards natural response ...
research
04/10/2017

Automated Curriculum Learning for Neural Networks

We introduce a method for automatically selecting the path, or syllabus,...
research
06/30/2022

Ranking in Contextual Multi-Armed Bandits

We study a ranking problem in the contextual multi-armed bandit setting....
research
12/08/2020

A Multi-Armed Bandit-based Approach to Mobile Network Provider Selection

We argue for giving users the ability to lease bandwidth temporarily fro...
research
02/24/2023

Active Velocity Estimation using Light Curtains via Self-Supervised Multi-Armed Bandits

To navigate in an environment safely and autonomously, robots must accur...

Please sign up or login with your details

Forgot password? Click here to reset