Investigating Forgetting in Pre-Trained Representations Through Continual Learning

by   Yun Luo, et al.

Representation forgetting refers to the drift of contextualized representations during continual training. Intuitively, the representation forgetting can influence the general knowledge stored in pre-trained language models (LMs), but the concrete effect is still unclear. In this paper, we study the effect of representation forgetting on the generality of pre-trained language models, i.e. the potential capability for tackling future downstream tasks. Specifically, we design three metrics, including overall generality destruction (GD), syntactic knowledge forgetting (SynF), and semantic knowledge forgetting (SemF), to measure the evolution of general knowledge in continual learning. With extensive experiments, we find that the generality is destructed in various pre-trained LMs, and syntactic and semantic knowledge is forgotten through continual learning. Based on our experiments and analysis, we further get two insights into alleviating general knowledge forgetting: 1) training on general linguistic tasks at first can mitigate general knowledge forgetting; 2) the hybrid continual learning method can mitigate the generality destruction and maintain more general knowledge compared with those only considering rehearsal or regularization.


page 1

page 2

page 3

page 4


Continual Pre-Training Mitigates Forgetting in Language and Vision

Pre-trained models are nowadays a fundamental component of machine learn...

Overcoming General Knowledge Loss with Selective Parameter Finetuning

Foundation models encompass an extensive knowledge base and offer remark...

Class-Incremental Learning based on Label Generation

Despite the great success of pre-trained language models, it is still a ...

Towards Continual Knowledge Learning of Language Models

Large Language Models (LMs) are known to encode world knowledge in their...

Towards General Purpose Medical AI: Continual Learning Medical Foundation Model

Inevitable domain and task discrepancies in real-world scenarios can imp...

Continual Learning of Semantic Segmentation using Complementary 2D-3D Data Representations

Semantic segmentation networks are usually pre-trained and not updated d...

On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of Code

Pre-trained language models (PLMs) have become a prevalent technique in ...

Please sign up or login with your details

Forgot password? Click here to reset