Meta-learning with Stochastic Linear Bandits

by   Leonardo Cella, et al.
Istituto Italiano di Tecnologia

We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal is to select a learning algorithm which works well on average over a class of bandits tasks, that are sampled from a task-distribution. Inspired by recent work on learning-to-learn linear regression, we consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We first study the benefit of the biased OFUL algorithm in terms of regret minimization. We then propose two strategies to estimate the bias within the learning-to-learn setting. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.


page 1

page 2

page 3

page 4


Multi-Environment Meta-Learning in Stochastic Linear Bandits

In this work we investigate meta-learning (or learning-to-learn) approac...

Meta Learning MDPs with Linear Transition Models

We study meta-learning in Markov Decision Processes (MDP) with linear tr...

Learning-to-Learn Stochastic Gradient Descent with Biased Regularization

We study the problem of learning-to-learn: inferring a learning algorith...

Incremental Learning-to-Learn with Statistical Guarantees

In learning-to-learn the goal is to infer a learning algorithm that work...

Meta Representation Learning with Contextual Linear Bandits

Meta-learning seeks to build algorithms that rapidly learn how to solve ...

Meta-learning for mixed linear regression

In modern supervised learning, there are a large number of tasks, but ma...

Meta-Learning Adversarial Bandits

We study online learning with bandit feedback across multiple tasks, wit...

Please sign up or login with your details

Forgot password? Click here to reset