An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning

by   Yun Luo, et al.

Catastrophic forgetting (CF) is a phenomenon that occurs in machine learning when a model forgets previously learned information as it learns new information. As large language models (LLMs) have shown excellent performance, it is interesting to uncover whether CF exists in the continual fine-tuning of LLMs. In this study, we empirically evaluate the forgetting phenomenon in LLMs' knowledge, from the perspectives of domain knowledge, reasoning, and reading comprehension. The experiments demonstrate that catastrophic forgetting is generally observed in LLMs ranging from 1b to 7b. Furthermore, as the scale increases, the severity of forgetting also intensifies. Comparing the decoder-only model BLOOMZ with the encoder-decoder model mT0, BLOOMZ suffers less forgetting and maintains more knowledge. We also observe that LLMs can mitigate language bias (e.g. gender bias) during continual fine-tuning. Moreover, we find that ALPACA can maintain more knowledge and capacity compared with LLAMA during the continual fine-tuning, which implies that general instruction tuning can help mitigate the forgetting phenomenon of LLMs in the further fine-tuning process.


page 1

page 2

page 3

page 4


Forget Me Not: Reducing Catastrophic Forgetting for Domain Adaptation in Reading Comprehension

The creation of large-scale open domain reading comprehension data sets ...

Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models

In this work, we study how the large-scale pretrain-finetune framework c...

Self Information Update for Large Language Models through Mitigating Exposure Bias

Current LLMs have demonstrated remarkable capabilities in addressing use...

On Compositionality and Improved Training of NADO

NeurAlly-Decomposed Oracle (NADO) is a powerful approach for controllabl...

Continual Speaker Adaptation for Text-to-Speech Synthesis

Training a multi-speaker Text-to-Speech (TTS) model from scratch is comp...

Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference

Fine-tuning has been proven to be a simple and effective technique to tr...

Please sign up or login with your details

Forgot password? Click here to reset