Feed-Forward Blocks Control Contextualization in Masked Language Models

02/01/2023
by   Goro Kobayashi, et al.
0

Understanding the inner workings of neural network models is a crucial step for rationalizing their output and refining their architecture. Transformer-based models are the core of recent natural language processing and have been analyzed typically with attention patterns as their epoch-making feature is contextualizing surrounding input words via attention mechanisms. In this study, we analyze their inner contextualization by considering all the components, including the feed-forward block (i.e., a feed-forward layer and its surrounding residual and normalization layers) as well as the attention. Our experiments with masked language models show that each of the previously overlooked components did modify the degree of the contextualization in case of processing special word-word pairs (e.g., consisting of named entities). Furthermore, we find that some components cancel each other's effects. Our results could update the typical view about each component's roles (e.g., attention performs contextualization, and the other components serve different roles) in the Transformer layer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2023

Parallel Attention and Feed-Forward Net Design for Pre-training and Inference on Transformers

In this paper, we introduce Parallel Attention and Feed-Forward Net Desi...
research
09/15/2021

Incorporating Residual and Normalization Layers into Analysis of Masked Language Models

Transformer architecture has become ubiquitous in the natural language p...
research
02/24/2023

Analyzing And Editing Inner Mechanisms Of Backdoored Language Models

Recent advancements in interpretability research made transformer langua...
research
06/10/2021

GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures

Attention based language models have become a critical component in stat...
research
12/29/2020

Transformer Feed-Forward Layers Are Key-Value Memories

Feed-forward layers constitute two-thirds of a transformer model's param...
research
03/28/2022

Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space

Transformer-based language models (LMs) are at the core of modern NLP, b...
research
09/09/2020

Pay Attention when Required

Transformer-based models consist of interleaved feed-forward blocks - th...

Please sign up or login with your details

Forgot password? Click here to reset