One Wide Feedforward is All You Need

09/04/2023
by   Telmo Pessoa Pires, et al.
0

The Transformer architecture has two main non-embedding components: Attention and the Feed Forward Network (FFN). Attention captures interdependencies between words regardless of their position, while the FFN non-linearly transforms each input token independently. In this work we explore the role of the FFN, and find that despite taking up a significant fraction of the model's parameters, it is highly redundant. Concretely, we are able to substantially reduce the number of parameters with only a modest drop in accuracy by removing the FFN on the decoder layers and sharing a single FFN across the encoder. Finally we scale this architecture back to its original size by increasing the hidden dimension of the shared FFN, achieving substantial gains in both accuracy and latency with respect to the original Transformer Big.

READ FULL TEXT
research
08/07/2021

PSViT: Better Vision Transformer via Token Pooling and Attention Sharing

In this paper, we observe two levels of redundancies when applying visio...
research
11/25/2021

New Approaches to Long Document Summarization: Fourier Transform Based Attention in a Transformer Model

In this work, we extensively redesign the newly introduced method of tok...
research
12/29/2020

Transformer Feed-Forward Layers Are Key-Value Memories

Feed-forward layers constitute two-thirds of a transformer model's param...
research
09/23/2020

Hamming OCR: A Locality Sensitive Hashing Neural Network for Scene Text Recognition

Recently, inspired by Transformer, self-attention-based scene text recog...
research
04/16/2022

Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks

Despite the exciting performance, Transformer is criticized for its exce...
research
10/14/2019

Pruning a BERT-based Question Answering Model

We investigate compressing a BERT-based question answering system by pru...

Please sign up or login with your details

Forgot password? Click here to reset