Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers

09/17/2021
by   Jason Phang, et al.
6

Despite the success of fine-tuning pretrained language encoders like BERT for downstream natural language understanding (NLU) tasks, it is still poorly understood how neural networks change after fine-tuning. In this work, we use centered kernel alignment (CKA), a method for comparing learned representations, to measure the similarity of representations in task-tuned models across layers. In experiments across twelve NLU tasks, we discover a consistent block diagonal structure in the similarity of representations within fine-tuned RoBERTa and ALBERT models, with strong similarity within clusters of earlier and later layers, but not between them. The similarity of later layer representations implies that later layers only marginally contribute to task performance, and we verify in experiments that the top few layers of fine-tuned Transformers can be discarded without hurting performance, even with no further tuning.

READ FULL TEXT

page 1

page 3

page 5

page 10

research
11/08/2019

What Would Elsa Do? Freezing Layers During Transformer Fine-Tuning

Pretrained transformer-based language models have achieved state of the ...
research
06/09/2023

Understanding the Benefits of Image Augmentations

Image Augmentations are widely used to reduce overfitting in neural netw...
research
01/02/2019

Visualizing Deep Similarity Networks

For convolutional neural network models that optimize an image embedding...
research
03/14/2023

Feature representations useful for predicting image memorability

Predicting image memorability has attracted interest in various fields. ...
research
10/19/2020

BERTnesia: Investigating the capture and forgetting of knowledge in BERT

Probing complex language models has recently revealed several insights i...
research
05/03/2020

Similarity Analysis of Contextual Word Representation Models

This paper investigates contextual word representation models from the l...
research
06/01/2021

Comparing Test Sets with Item Response Theory

Recent years have seen numerous NLP datasets introduced to evaluate the ...

Please sign up or login with your details

Forgot password? Click here to reset