Continual Pre-Training Mitigates Forgetting in Language and Vision

by   Andrea Cossu, et al.
KU Leuven
Scuola Normale Superiore
University of Pisa

Pre-trained models are nowadays a fundamental component of machine learning research. In continual learning, they are commonly used to initialize the model before training on the stream of non-stationary data. However, pre-training is rarely applied during continual learning. We formalize and investigate the characteristics of the continual pre-training scenario in both language and vision environments, where a model is continually pre-trained on a stream of incoming data and only later fine-tuned to different downstream tasks. We show that continually pre-trained models are robust against catastrophic forgetting and we provide strong empirical evidence supporting the fact that self-supervised pre-training is more effective in retaining previous knowledge than supervised protocols. Code is provided at .


page 6

page 9

page 18

page 19


Self-Supervised Training Enhances Online Continual Learning

In continual learning, a system must incrementally learn from a non-stat...

Foundational Models for Continual Learning: An Empirical Study of Latent Replay

Rapid development of large-scale pre-training has resulted in foundation...

On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of Code

Pre-trained language models (PLMs) have become a prevalent technique in ...

An Empirical Investigation of the Role of Pre-training in Lifelong Learning

The lifelong learning paradigm in machine learning is an attractive alte...

Investigating Forgetting in Pre-Trained Representations Through Continual Learning

Representation forgetting refers to the drift of contextualized represen...

Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning

Continual learning (CL) learns a sequence of tasks incrementally with th...

Task Agnostic Representation Consolidation: a Self-supervised based Continual Learning Approach

Continual learning (CL) over non-stationary data streams remains one of ...

Code Repositories


Code to reproduce experiments from the paper "Continual Pre-Training Mitigates Forgetting in Language and Vision"

view repo

Please sign up or login with your details

Forgot password? Click here to reset