Exploiting Relevance for Online Decision-Making in High-Dimensions

by   Eralp Turğay, et al.

Many sequential decision-making tasks require choosing at each decision step the right action out of the vast set of possibilities by extracting actionable intelligence from high-dimensional data streams. Most of the times, the high-dimensionality of actions and data makes learning of the optimal actions by traditional learning methods impracticable. In this work, we investigate how to discover and leverage the low-dimensional structure in actions and data to enable fast learning. As our learning model, we consider a structured contextual multi-armed bandit (CMAB) with high-dimensional arm (action) and context (data) sets, where the rewards depend only on a few relevant dimensions of the joint context-arm set. We depart from the prior work by assuming a high-dimensional and uncountable arm set, and allow relevant context dimensions to vary for each arm. We propose a new online learning algorithm called CMAB with Relevance Learning (CMAB-RL) and prove that its time-averaged regret asymptotically goes to zero. CMAB-RL enjoys a substantially improved regret bound compared to classical CMAB algorithms whose regrets depend on the dimensions d_x and d_a of the context and arm sets. Importantly, we show that if the learner knows upper bounds d_x and d_a on the number of relevant context and arm dimensions, then CMAB-RL achieves Õ(T^1 - 1 /(2 + 2d_x + d_a)) regret. Finally, we illustrate how CMAB algorithms can be used for optimal personalized blood glucose control in type 1 diabetes mellitus patients, and show that CMAB-RL outperforms other contextual MAB algorithms in this task, where the contexts represent multimodal physiological data streams obtained from sensor readings and the arms represent bolus insulin doses that are appropriate for injection.


page 1

page 10


RELEAF: An Algorithm for Learning and Exploiting Relevance

Recommender systems, medical diagnosis, network security, etc., require ...

Multi-objective Contextual Bandit Problem with Similarity Information

In this paper we propose the multi-objective contextual bandit problem w...

Multi-Task Learning for Contextual Bandits

Contextual bandits are a form of multi-armed bandit in which the agent h...

Online Learning and Decision-Making under Generalized Linear Model with High-Dimensional Data

We propose a minimax concave penalized multi-armed bandit algorithm unde...

Bayesian Linear Bandits for Large-Scale Recommender Systems

Potentially, taking advantage of available side information boosts the p...

Contextual Bandits with Latent Confounders: An NMF Approach

Motivated by online recommendation and advertising systems, we consider ...

Contextual Bandits and Optimistically Universal Learning

We consider the contextual bandit problem on general action and context ...

Please sign up or login with your details

Forgot password? Click here to reset