How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?

12/22/2021
by   Xinhsuai Dong, et al.
0

The fine-tuning of pre-trained language models has a great success in many NLP fields. Yet, it is strikingly vulnerable to adversarial examples, e.g., word substitution attacks using only synonyms can easily fool a BERT-based sentiment analysis model. In this paper, we demonstrate that adversarial training, the prevalent defense technique, does not directly fit a conventional fine-tuning scenario, because it suffers severely from catastrophic forgetting: failing to retain the generic and robust linguistic features that have already been captured by the pre-trained model. In this light, we propose Robust Informative Fine-Tuning (RIFT), a novel adversarial fine-tuning method from an information-theoretical perspective. In particular, RIFT encourages an objective model to retain the features learned from the pre-trained model throughout the entire fine-tuning process, whereas a conventional one only uses the pre-trained weights for initialization. Experimental results show that RIFT consistently outperforms the state-of-the-arts on two popular NLP tasks: sentiment analysis and natural language inference, under different attacks across various pre-trained language models.

READ FULL TEXT
research
10/18/2022

ROSE: Robust Selective Fine-tuning for Pre-trained Language Models

Even though the large-scale language models have achieved excellent perf...
research
10/28/2022

RoChBert: Towards Robust BERT Fine-tuning for Chinese

Despite of the superb performance on a wide range of tasks, pre-trained ...
research
10/18/2022

Fine-mixing: Mitigating Backdoors in Fine-tuned Language Models

Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attac...
research
09/28/2020

Domain Adversarial Fine-Tuning as an Effective Regularizer

In Natural Language Processing (NLP), pre-trained language models (LMs) ...
research
09/19/2023

Investigating the Catastrophic Forgetting in Multimodal Large Language Models

Following the success of GPT4, there has been a surge in interest in mul...
research
10/05/2020

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Large-scale language models such as BERT have achieved state-of-the-art ...
research
05/08/2023

Diffusion Theory as a Scalpel: Detecting and Purifying Poisonous Dimensions in Pre-trained Language Models Caused by Backdoor or Bias

Pre-trained Language Models (PLMs) may be poisonous with backdoors or bi...

Please sign up or login with your details

Forgot password? Click here to reset