ETC: Encoding Long and Structured Data in Transformers

by   Joshua Ainslie, et al.

Transformer-based models have pushed the state of the art in many natural language processing tasks. However, one of their main limitations is the quadratic computational and memory cost of the standard attention mechanism. In this paper, we present a new family of Transformer models, which we call the Extended Transformer Construction (ETC), that allows for significant increases in input sequence length by introducing a new global-local attention mechanism between a global memory and the standard input tokens. We also show that combining global-local attention with relative position encodings allows ETC to handle structured data with ease. Empirical results on the Natural Questions data set show the promise of the approach.


page 5

page 6


Recurrent Memory Transformer

Transformer-based models show their effectiveness across multiple domain...

LongT5: Efficient Text-To-Text Transformer for Long Sequences

Recent work has shown that either (1) increasing the input length or (2)...

Big Bird: Transformers for Longer Sequences

Transformers-based models, such as BERT, have been one of the most succe...

ChunkFormer: Learning Long Time Series with Multi-stage Chunked Transformer

The analysis of long sequence data remains challenging in many real-worl...

AttentionViz: A Global View of Transformer Attention

Transformer models are revolutionizing machine learning, but their inner...

Landmark Attention: Random-Access Infinite Context Length for Transformers

While transformers have shown remarkable success in natural language pro...

GMAT: Global Memory Augmentation for Transformers

Transformer-based models have become ubiquitous in natural language proc...

Code Repositories


The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".

view repo

Please sign up or login with your details

Forgot password? Click here to reset