Frustratingly Short Attention Spans in Neural Language Modeling

02/15/2017
by   Michał Daniluk, et al.
0

Neural language models predict the next token using a latent representation of the immediate token history. Recently, various methods for augmenting neural language models with an attention mechanism over a differentiable memory have been proposed. For predicting the next token, these models query information from a memory of the recent history which can facilitate learning mid- and long-range dependencies. However, conventional attention mechanisms used in memory-augmented neural language models produce a single output vector per time step. This vector is used both for predicting the next token as well as for the key and value of a differentiable memory of a token history. In this paper, we propose a neural language model with a key-value attention mechanism that outputs separate representations for the key and value of a differentiable memory, as well as for encoding the next-word distribution. This model outperforms existing memory-augmented neural language models on two corpora. Yet, we found that our method mainly utilizes a memory of the five most recent output representations. This led to the unexpected main finding that a much simpler model based only on the concatenation of recent output representations from previous time steps is on par with more sophisticated memory-augmented neural language models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/13/2016

Improving Neural Language Models with a Continuous Cache

We propose an extension to neural network language models to adapt their...
research
11/26/2016

Attention-based Memory Selection Recurrent Network for Language Modeling

Recurrent neural networks (RNNs) have achieved great success in language...
research
05/25/2022

Training Language Models with Memory Augmentation

Recent work has improved language models remarkably by equipping them wi...
research
08/28/2023

MEMORY-VQ: Compression for Tractable Internet-Scale Memory

Retrieval augmentation is a powerful but expensive method to make langua...
research
03/16/2022

Memorizing Transformers

Language models typically need to be trained or finetuned in order to ac...
research
05/16/2023

SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification

The high computational and memory requirements of generative large langu...
research
01/10/2023

Memory Augmented Large Language Models are Computationally Universal

We show that transformer-based large language models are computationally...

Please sign up or login with your details

Forgot password? Click here to reset