Chameleon: Learning Model Initializations Across Tasks With Different Schemas

by   Lukas Brinkmeyer, et al.

Parametric models, and particularly neural networks, require weight initialization as a starting point for gradient-based optimization. In most current practices, this is accomplished by using some form of random initialization. Instead, recent work shows that a specific initial parameter set can be learned from a population of tasks, i.e., dataset and target variable for supervised learning tasks. Using this initial parameter set leads to faster convergence for new tasks (model-agnostic meta-learning). Currently, methods for learning model initializations are limited to a population of tasks sharing the same schema, i.e., the same number, order, type and semantics of predictor and target variables. In this paper, we address the problem of meta-learning parameter initialization across tasks with different schemas, i.e., if the number of predictors varies across tasks, while they still share some variables. We propose Chameleon, a model that learns to align different predictor schemas to a common representation. We use permutations and masks of the predictors of the training tasks at hand. In experiments on real-life data sets, we show that Chameleon successfully can learn parameter initializations across tasks with different schemas providing a 26% lift on accuracy on average over random initialization and of 5% over a state-of-the-art method for fixed-schema learning model initializations. To the best of our knowledge, our paper is the first work on the problem of learning model initialization across tasks with different schemas.


page 6

page 9


HIDRA: Head Initialization across Dynamic targets for Robust Architectures

The performance of gradient-based optimization strategies depends heavil...

Dataset2Vec: Learning Dataset Meta-Features

Machine learning tasks such as optimizing the hyper-parameters of a mode...

Learning to Forget for Meta-Learning

Few-shot learning is a challenging problem where the system is required ...

Optimizing Unlicensed Coexistence Network Performance Through Data Learning

Unlicensed LTE-WiFi coexistence networks are undergoing consistent densi...

Meta-Learning with Adjoint Methods

Model Agnostic Meta-Learning (MAML) is widely used to find a good initia...

Learning-to-learn non-convex piecewise-Lipschitz functions

We analyze the meta-learning of the initialization and step-size of lear...

Parameterized Neural Networks for Finance

We discuss and analyze a neural network architecture, that enables learn...

Please sign up or login with your details

Forgot password? Click here to reset