Preventing Catastrophic Forgetting in Continual Learning of New Natural Language Tasks

02/22/2023
by   Sudipta Kar, et al.
0

Multi-Task Learning (MTL) is widely-accepted in Natural Language Processing as a standard technique for learning multiple related tasks in one model. Training an MTL model requires having the training data for all tasks available at the same time. As systems usually evolve over time, (e.g., to support new functionalities), adding a new task to an existing MTL model usually requires retraining the model from scratch on all the tasks and this can be time-consuming and computationally expensive. Moreover, in some scenarios, the data used to train the original training may be no longer available, for example, due to storage or privacy concerns. In this paper, we approach the problem of incrementally expanding MTL models' capability to solve new tasks over time by distilling the knowledge of an already trained model on n tasks into a new one for solving n+1 tasks. To avoid catastrophic forgetting, we propose to exploit unlabeled data from the same distributions of the old tasks. Our experiments on publicly available benchmarks show that such a technique dramatically benefits the distillation by preserving the already acquired knowledge (i.e., preventing up to 20 obtaining good performance on the incrementally added tasks. Further, we also show that our approach is beneficial in practical settings by using data from a leading voice assistant.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/05/2021

Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning

Continual learning (CL) learns a sequence of tasks incrementally with th...
research
08/07/2017

Measuring Catastrophic Forgetting in Neural Networks

Deep neural networks are used in many state-of-the-art systems for machi...
research
02/21/2022

BERT WEAVER: Using WEight AVERaging to Enable Lifelong Learning for Transformer-based Models

Recent developments in transfer learning have boosted the advancements i...
research
12/14/2022

Learning useful representations for shifting tasks and distributions

Does the dominant approach to learn representations (as a side effect of...
research
10/07/2022

Learnware: Small Models Do Big

There are complaints about current machine learning techniques such as t...
research
10/02/2020

Continual Learning for Natural Language Generation in Task-oriented Dialog Systems

Natural language generation (NLG) is an essential component of task-orie...
research
03/10/2022

Online Deep Metric Learning via Mutual Distillation

Deep metric learning aims to transform input data into an embedding spac...

Please sign up or login with your details

Forgot password? Click here to reset