Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference

by   Junhao Zheng, et al.

Fine-tuning has been proven to be a simple and effective technique to transfer the learned knowledge of Pre-trained Language Models (PLMs) to downstream tasks. However, vanilla fine-tuning easily overfits the target data and degrades the generalization ability. Most existing studies attribute it to catastrophic forgetting, and they retain the pre-trained knowledge indiscriminately without identifying what knowledge is transferable. Motivated by this, we frame fine-tuning into a causal graph and discover that the crux of catastrophic forgetting lies in the missing causal effects from the pretrained data. Based on the causal view, we propose a unified objective for fine-tuning to retrieve the causality back. Intriguingly, the unified objective can be seen as the sum of the vanilla fine-tuning objective, which learns new knowledge from target data, and the causal objective, which preserves old knowledge from PLMs. Therefore, our method is flexible and can mitigate negative transfer while preserving knowledge. Since endowing models with commonsense is a long-standing challenge, we implement our method on commonsense QA with a proposed heuristic estimation to verify its effectiveness. In the experiments, our method outperforms state-of-the-art fine-tuning methods on all six commonsense QA datasets and can be implemented as a plug-in module to inflate the performance of existing QA models.


page 1

page 2

page 3

page 4


CoCoLM: COmplex COmmonsense Enhanced Language Model

Large-scale pre-trained language models have demonstrated strong knowled...

Enhancing Language Models with Plug-and-Play Large-Scale Commonsense

We study how to enhance language models (LMs) with textual commonsense k...

PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation

Prompt-tuning, which freezes pretrained language models (PLMs) and only ...

Plug-and-Play Adaptation for Continuously-updated QA

Language models (LMs) have shown great potential as implicit knowledge b...

An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning

Catastrophic forgetting (CF) is a phenomenon that occurs in machine lear...

Alleviating Representational Shift for Continual Fine-tuning

We study a practical setting of continual learning: fine-tuning on a pre...

Neuro-Symbolic Causal Language Planning with Commonsense Prompting

Language planning aims to implement complex high-level goals by decompos...

Please sign up or login with your details

Forgot password? Click here to reset