Compression of Recurrent Neural Networks for Efficient Language Modeling

02/06/2019
by   Artem M. Grachev, et al.
0

Recurrent neural networks have proved to be an effective method for statistical language modeling. However, in practice their memory and run-time complexity are usually too large to be implemented in real-time offline mobile applications. In this paper we consider several compression techniques for recurrent neural networks including Long-Short Term Memory models. We make particular attention to the high-dimensional output problem caused by the very large vocabulary size. We focus on effective compression methods in the context of their exploitation on devices: pruning, quantization, and matrix decomposition approaches (low-rank factorization and tensor train decomposition, in particular). For each model we investigate the trade-off between its size, suitability for fast inference and perplexity. We propose a general pipeline for applying the most suitable methods to compress recurrent neural networks for language modeling. It has been shown in the experimental study with the Penn Treebank (PTB) dataset that the most efficient results in terms of speed and compression-perplexity balance are obtained by matrix decomposition techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2017

Neural Networks Compression for Language Modeling

In this paper, we consider several compression techniques for the langua...
research
06/30/2019

Multiplicative Models for Recurrent Language Modeling

Recently, there has been interest in multiplicative recurrent neural net...
research
12/02/2016

Parameter Compression of Recurrent Neural Networks and Degradation of Short-term Memory

The significant computational costs of deploying neural networks in larg...
research
06/30/2020

Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules

Robust perception relies on both bottom-up and top-down signals. Bottom-...
research
08/16/2011

A Machine Learning Perspective on Predictive Coding with PAQ

PAQ8 is an open source lossless data compression algorithm that currentl...
research
08/21/2020

Kronecker CP Decomposition with Fast Multiplication for Compressing RNNs

Recurrent neural networks (RNNs) are powerful in the tasks oriented to s...
research
03/16/2018

Reviving and Improving Recurrent Back-Propagation

In this paper, we revisit the recurrent back-propagation (RBP) algorithm...

Please sign up or login with your details

Forgot password? Click here to reset