Gradient Agreement as an Optimization Objective for Meta-Learning

by   Amir Erfan Eshratifar, et al.

This paper presents a novel optimization method for maximizing generalization over tasks in meta-learning. The goal of meta-learning is to learn a model for an agent adapting rapidly when presented with previously unseen tasks. Tasks are sampled from a specific distribution which is assumed to be similar for both seen and unseen tasks. We focus on a family of meta-learning methods learning initial parameters of a base model which can be fine-tuned quickly on a new task, by few gradient steps (MAML). Our approach is based on pushing the parameters of the model to a direction in which tasks have more agreement upon. If the gradients of a task agree with the parameters update vector, then their inner product will be a large positive value. As a result, given a batch of tasks to be optimized for, we associate a positive (negative) weight to the loss function of a task, if the inner product between its gradients and the average of the gradients of all tasks in the batch is a positive (negative) value. Therefore, the degree of the contribution of a task to the parameter updates is controlled by introducing a set of weights on the loss function of the tasks. Our method can be easily integrated with the current meta-learning algorithms for neural networks. Our experiments demonstrate that it yields models with better generalization compared to MAML and Reptile.


page 1

page 2

page 3

page 4


Towards Understanding Generalization in Gradient-Based Meta-Learning

In this work we study generalization of neural networks in gradient-base...

Large-Scale Meta-Learning with Continual Trajectory Shifting

Meta-learning of shared initialization parameters has shown to be highly...

Reptile: a Scalable Metalearning Algorithm

This paper considers metalearning problems, where there is a distributio...

MetaLDC: Meta Learning of Low-Dimensional Computing Classifiers for Fast On-Device Adaption

Fast model updates for unseen tasks on intelligent edge devices are cruc...

Meta Learning by the Baldwin Effect

The scope of the Baldwin effect was recently called into question by two...

Multi-Domain Learning by Meta-Learning: Taking Optimal Steps in Multi-Domain Loss Landscapes by Inner-Loop Learning

We consider a model-agnostic solution to the problem of Multi-Domain Lea...

Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-learning

Gradient based meta-learning methods are prone to overfit on the meta-tr...

Please sign up or login with your details

Forgot password? Click here to reset