On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers

by   Marius Mosbach, et al.

Fine-tuning pre-trained contextualized embedding models has become an integral part of the NLP pipeline. At the same time, probing has emerged as a way to investigate the linguistic knowledge captured by pre-trained models. Very little is, however, understood about how fine-tuning affects the representations of pre-trained models and thereby the linguistic knowledge they encode. This paper contributes towards closing this gap. We study three different pre-trained models: BERT, RoBERTa, and ALBERT, and investigate through sentence-level probing how fine-tuning affects their representations. We find that for some probing tasks fine-tuning leads to substantial changes in accuracy, possibly suggesting that fine-tuning introduces or even removes linguistic knowledge from a pre-trained model. These changes, however, vary greatly across different models, fine-tuning and probing tasks. Our analysis reveals that while fine-tuning indeed changes the representations of a pre-trained model and these changes are typically larger for higher layers, only in very few cases, fine-tuning has a positive effect on probing accuracy that is larger than just using the pre-trained model with a strong pooling method. Based on our findings, we argue that both positive and negative effects of fine-tuning on probing require a careful interpretation.


page 1

page 2

page 3

page 4


What Happens To BERT Embeddings During Fine-tuning?

While there has been much recent work studying how linguistic informatio...

AdapterHub: A Framework for Adapting Transformers

The current modus operandi in NLP involves downloading and fine-tuning p...

How Does Fine-tuning Affect the Geometry of Embedding Space: A Case Study on Isotropy

It is widely accepted that fine-tuning pre-trained language models usual...

On the Importance of Data Size in Probing Fine-tuned Models

Several studies have investigated the reasons behind the effectiveness o...

On the contribution of pre-trained models to accuracy and utility in modeling distributed energy resources

Despite their growing popularity, data-driven models of real-world dynam...

SE3M: A Model for Software Effort Estimation Using Pre-trained Embedding Models

Estimating effort based on requirement texts presents many challenges, e...

Revisiting the Updates of a Pre-trained Model for Few-shot Learning

Most of the recent few-shot learning algorithms are based on transfer le...

Please sign up or login with your details

Forgot password? Click here to reset