How Important is the Train-Validation Split in Meta-Learning?

by   Yu Bai, et al.

Meta-learning aims to perform fast adaptation on a new task through learning a "prior" from multiple existing tasks. A common practice in meta-learning is to perform a train-validation split where the prior adapts to the task on one split of the data, and the resulting predictor is evaluated on another split. Despite its prevalence, the importance of the train-validation split is not well understood either in theory or in practice, particularly in comparison to the more direct non-splitting method, which uses all the per-task data for both training and evaluation. We provide a detailed theoretical study on whether and when the train-validation split is helpful on the linear centroid meta-learning problem, in the asymptotic setting where the number of tasks goes to infinity. We show that the splitting method converges to the optimal prior as expected, whereas the non-splitting method does not in general without structural assumptions on the data. In contrast, if the data are generated from linear models (the realizable regime), we show that both the splitting and non-splitting methods converge to the optimal prior. Further, perhaps surprisingly, our main result shows that the non-splitting method achieves a strictly better asymptotic excess risk under this data distribution, even when the regularization parameter and split ratio are optimally tuned for both methods. Our results highlight that data splitting may not always be preferable, especially when the data is realizable by the model. We validate our theories by experimentally showing that the non-splitting method can indeed outperform the splitting method, on both simulations and real meta-learning tasks.


page 1

page 2

page 3

page 4


Continuous Meta-Learning without Tasks

Meta-learning is a promising strategy for learning to efficiently learn ...

A New Optimality Property of Strang's Splitting

For systems of the form q̇ = M^-1 p, ṗ = -Aq+f(q), common in many applic...

Prediction of Hemolysis Tendency of Peptides using a Reliable Evaluation Method

There are numerous peptides discovered through past decades, which exhib...

On Optimality of Meta-Learning in Fixed-Design Regression with Weighted Biased Regularization

We consider a fixed-design linear regression in the meta-learning model ...

Generating meta-learning tasks to evolve parametric loss for classification learning

The field of meta-learning has seen a dramatic rise in interest in recen...

Bayes meets Bernstein at the Meta Level: an Analysis of Fast Rates in Meta-Learning with PAC-Bayes

Bernstein's condition is a key assumption that guarantees fast rates in ...

To Split or Not to Split: The Impact of Disparate Treatment in Classification

Disparate treatment occurs when a machine learning model produces differ...

Please sign up or login with your details

Forgot password? Click here to reset