Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size

08/16/2020
by   Davis Yoshida, et al.
0

Fine-tuning a pretrained transformer for a downstream task has become a standard method in NLP in the last few years. While the results from these models are impressive, applying them can be extremely computationally expensive, as is pretraining new models with the latest architectures. We present a novel method for applying pretrained transformer language models which lowers their memory requirement both at training and inference time. An additional benefit is that our method removes the fixed context size constraint that most transformer models have, allowing for more flexible use. When applied to the GPT-2 language model, we find that our method attains better perplexity than an unmodified GPT-2 model on the PG-19 and WikiText-103 corpora, for a given amount of computation or memory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/29/2020

Knowledge-Aware Language Model Pretraining

How much knowledge do pretrained language models hold? Recent research o...
research
09/22/2021

Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

There remain many open questions pertaining to the scaling behaviour of ...
research
07/13/2023

In-context Autoencoder for Context Compression in a Large Language Model

We propose the In-context Autoencoder (ICAE) for context compression in ...
research
11/09/2022

Large Language Models with Controllable Working Memory

Large language models (LLMs) have led to a series of breakthroughs in na...
research
11/07/2022

How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers

The attention mechanism is considered the backbone of the widely-used Tr...
research
01/25/2022

Pre-Trained Language Transformers are Universal Image Classifiers

Facial images disclose many hidden personal traits such as age, gender, ...
research
10/22/2020

AdapterDrop: On the Efficiency of Adapters in Transformers

Massively pre-trained transformer models are computationally expensive t...

Please sign up or login with your details

Forgot password? Click here to reset