Regularizing and Optimizing LSTM Language Models

08/07/2017
by   Stephen Merity, et al.
0

Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language modeling, and question answering. In this paper, we consider the specific problem of word-level language modeling and investigate strategies for regularizing and optimizing LSTM-based models. We propose the weight-dropped LSTM which uses DropConnect on hidden-to-hidden weights as a form of recurrent regularization. Further, we introduce NT-ASGD, a variant of the averaged stochastic gradient method, wherein the averaging trigger is determined using a non-monotonic condition as opposed to being tuned by the user. Using these and other regularization strategies, we achieve state-of-the-art word level perplexities on two data sets: 57.3 on Penn Treebank and 65.8 on WikiText-2. In exploring the effectiveness of a neural cache in conjunction with our proposed model, we achieve an even lower state-of-the-art perplexity of 52.8 on Penn Treebank and 52.0 on WikiText-2.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/08/2014

Recurrent Neural Network Regularization

We present a simple regularization technique for Recurrent Neural Networ...
research
06/30/2019

Multiplicative Models for Recurrent Language Modeling

Recently, there has been interest in multiplicative recurrent neural net...
research
09/15/2017

Learning Intrinsic Sparse Structures within Long Short-Term Memory

Model compression is significant for the wide adoption of Recurrent Neur...
research
07/09/2019

Comparing the Performance of the LSTM and HMM Language Models via Structural Similarity

Language models based on deep neural networks and traditional stochastic...
research
11/26/2015

Regularizing RNNs by Stabilizing Activations

We stabilize the activations of Recurrent Neural Networks (RNNs) by pena...
research
06/09/2016

MuFuRU: The Multi-Function Recurrent Unit

Recurrent neural networks such as the GRU and LSTM found wide adoption i...
research
07/25/2017

Dual Rectified Linear Units (DReLUs): A Replacement for Tanh Activation Functions in Quasi-Recurrent Neural Networks

In this paper, we introduce a novel type of Rectified Linear Unit (ReLU)...

Please sign up or login with your details

Forgot password? Click here to reset