NoisyTune: A Little Noise Can Help You Finetune Pretrained Language Models Better
Effectively finetuning pretrained language models (PLMs) is critical for their success in downstream tasks. However, PLMs may have risks in overfitting pretraining signals, and there are some gaps between downstream tasks and the pretraining tasks. It can be difficult for vanilla finetuning methods to overcome the barrier between pretraining and downstream tasks, which leads to suboptimal performance. In this paper, we propose a very simple yet effective method named NoisyTune which can help better finetune PLMs in downstream tasks by adding some noise to the parameters of PLMs before finetuning. More specifically, we propose a matrix-wise perturbing method by adding different uniform noises according to the standard deviations of different parameter matrices, which can consider the varied characteristics of different types of parameters in PLMs. Extensive experiments on the GLUE English benchmark and the XTREME multilingual benchmark show that NoisyTune can consistently improve the performance of different PLMs in many downstream tasks.
READ FULL TEXT