Resurrecting Recurrent Neural Networks for Long Sequences

03/11/2023
by   Antonio Orvieto, et al.
10

Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train. Deep state-space models (SSMs) have recently been shown to perform remarkably well on long sequence modeling tasks, and have the added benefits of fast parallelizable training and RNN-like fast inference. However, while SSMs are superficially similar to RNNs, there are important differences that make it unclear where their performance boost over RNNs comes from. In this paper, we show that careful design of deep RNNs using standard signal propagation arguments can recover the impressive performance of deep SSMs on long-range reasoning tasks, while also matching their training speed. To achieve this, we analyze and ablate a series of changes to standard RNNs including linearizing and diagonalizing the recurrence, using better parameterizations and initializations, and ensuring proper normalization of the forward pass. Our results provide new insights on the origins of the impressive performance of deep SSMs, while also introducing an RNN block called the Linear Recurrent Unit that matches both their performance on the Long Range Arena benchmark and their computational efficiency.

READ FULL TEXT

page 1

page 17

research
07/06/2018

Sliced Recurrent Neural Networks

Recurrent neural networks have achieved great success in many NLP tasks....
research
08/22/2017

Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

Recurrent Neural Networks (RNNs) continue to show outstanding performanc...
research
11/12/2015

Improving performance of recurrent neural network with relu nonlinearity

In recent years significant progress has been made in successfully train...
research
07/09/2018

IGLOO: Slicing the Features Space to Represent Long Sequences

We introduce a new neural network architecture, IGLOO, which aims at pro...
research
09/19/2017

Interactive Music Generation with Positional Constraints using Anticipation-RNNs

Recurrent Neural Networks (RNNS) are now widely used on sequence generat...
research
06/13/2016

Neural Associative Memory for Dual-Sequence Modeling

Many important NLP problems can be posed as dual-sequence or sequence-to...
research
04/08/2019

A Statistical Investigation of Long Memory in Language and Music

Representation and learning of long-range dependencies is a central chal...

Please sign up or login with your details

Forgot password? Click here to reset