Structurally Sparsified Backward Propagation for Faster Long Short-Term Memory Training

06/01/2018
by   Maohua Zhu, et al.
0

Exploiting sparsity enables hardware systems to run neural networks faster and more energy-efficiently. However, most prior sparsity-centric optimization techniques only accelerate the forward pass of neural networks and usually require an even longer training process with iterative pruning and retraining. We observe that artificially inducing sparsity in the gradients of the gates in an LSTM cell has little impact on the training quality. Further, we can enforce structured sparsity in the gate gradients to make the LSTM backward pass up to 45 state-of-the-art sparsifying method on modern GPUs. Though the structured sparsifying method can impact the accuracy of a model, this performance gap can be eliminated by mixing our sparse training method and the standard dense training method. Experimental results show that the mixed method can achieve comparable results in a shorter time span than using purely dense training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/21/2020

SparseTrain: Exploiting Dataflow Sparsity for Efficient Convolutional Neural Networks Training

Training Convolutional Neural Networks (CNNs) usually requires a large n...
research
11/22/2019

SparseTrain:Leveraging Dynamic Sparsity in Training DNNs on General-Purpose SIMD Processors

Our community has greatly improved the efficiency of deep learning appli...
research
06/07/2021

Top-KAST: Top-K Always Sparse Training

Sparse neural networks are becoming increasingly important as the field ...
research
09/16/2021

Exploiting Activation based Gradient Output Sparsity to Accelerate Backpropagation in CNNs

Machine/deep-learning (ML/DL) based techniques are emerging as a driving...
research
06/20/2019

An Improved Trade-off Between Accuracy and Complexity with Progressive Gradient Pruning

Although deep neural networks (NNs) have achieved state-of-the-art accur...
research
05/25/2019

Bivariate Beta LSTM

Long Short-Term Memory (LSTM) infers the long term dependency through a ...
research
03/11/2022

DNN Training Acceleration via Exploring GPGPU Friendly Sparsity

The training phases of Deep neural network (DNN) consumes enormous proce...

Please sign up or login with your details

Forgot password? Click here to reset