Attention-likelihood relationship in transformers

03/15/2023
by   Valeria Ruscio, et al.
0

We analyze how large language models (LLMs) represent out-of-context words, investigating their reliance on the given context to capture their semantics. Our likelihood-guided text perturbations reveal a correlation between token likelihood and attention values in transformer-based language models. Extensive experiments reveal that unexpected tokens cause the model to attend less to the information coming from themselves to compute their representations, particularly at higher layers. These findings have valuable implications for assessing the robustness of LLMs in real-world scenarios. Fully reproducible codebase at https://github.com/Flegyas/AttentionLikelihood.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2021

Probing for Bridging Inference in Transformer Language Models

We probe pre-trained transformer language models for bridging inference....
research
12/05/2021

Dynamic Token Normalization Improves Vision Transformer

Vision Transformer (ViT) and its variants (e.g., Swin, PVT) have achieve...
research
03/30/2022

Transformer Language Models without Positional Encodings Still Learn Positional Information

Transformers typically require some form of positional encoding, such as...
research
06/01/2021

Implicit Representations of Meaning in Neural Language Models

Does the effectiveness of neural language models derive entirely from ac...
research
05/01/2020

Multi-scale Transformer Language Models

We investigate multi-scale transformer language models that learn repres...
research
05/27/2023

CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers

Vision-language models have achieved tremendous progress far beyond what...
research
06/01/2023

White-Box Transformers via Sparse Rate Reduction

In this paper, we contend that the objective of representation learning ...

Please sign up or login with your details

Forgot password? Click here to reset