Meta-Learning Hypothesis Spaces for Sequential Decision-making

by   Parnian Kassraie, et al.

Obtaining reliable, adaptive confidence sets for prediction functions (hypotheses) is a central challenge in sequential decision-making tasks, such as bandits and model-based reinforcement learning. These confidence sets typically rely on prior assumptions on the hypothesis space, e.g., the known kernel of a Reproducing Kernel Hilbert Space (RKHS). Hand-designing such kernels is error prone, and misspecification may lead to poor or unsafe performance. In this work, we propose to meta-learn a kernel from offline data (Meta-KeL). For the case where the unknown kernel is a combination of known base kernels, we develop an estimator based on structured sparsity. Under mild conditions, we guarantee that our estimated RKHS yields valid confidence sets that, with increasing amounts of offline data, become as tight as those given the true unknown kernel. We demonstrate our approach on the kernelized bandit problem (a.k.a. Bayesian optimization), where we establish regret bounds competitive with those given the true kernel. We also empirically evaluate the effectiveness of our approach on a Bayesian optimization task.


page 1

page 2

page 3

page 4


Meta-Learning surrogate models for sequential decision making

Meta-learning methods leverage past experience to learn data-driven indu...

Functional learning through kernels

This paper reviews the functional aspects of statistical learning theory...

No-regret Algorithms for Multi-task Bayesian Optimization

We consider multi-objective optimization (MOO) of an unknown vector-valu...

Tight Regret Bounds for Bayesian Optimization in One Dimension

We consider the problem of Bayesian optimization (BO) in one dimension, ...

Bayesian decision-making under misspecified priors with applications to meta-learning

Thompson sampling and other Bayesian sequential decision-making algorith...

Meta-Learning Priors for Safe Bayesian Optimization

In robotics, optimizing controller parameters under safety constraints i...

Lifelong Bandit Optimization: No Prior and No Regret

In practical applications, machine learning algorithms are often repeate...

Please sign up or login with your details

Forgot password? Click here to reset