Memory-Efficient Backpropagation Through Time

06/10/2016
by   Audrūnas Gruslys, et al.
0

We propose a novel approach to reduce memory consumption of the backpropagation through time (BPTT) algorithm when training recurrent neural networks (RNNs). Our approach uses dynamic programming to balance a trade-off between caching of intermediate results and recomputation. The algorithm is capable of tightly fitting within almost any user-set memory budget while finding an optimal execution policy minimizing the computational cost. Computational devices have limited memory capacity and maximizing a computational performance given a fixed memory budget is a practical use-case. We provide asymptotic computational upper bounds for various regimes. The algorithm is particularly effective for long sequences. For sequences of length 1000, our algorithm saves 95% of memory usage while using only one third more time per iteration than the standard BPTT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2019

A Graph Theoretic Framework of Recomputation Algorithms for Memory-Efficient Backpropagation

Recomputation algorithms collectively refer to a family of methods that ...
research
05/22/2018

Backpropagation for long sequences: beyond memory constraints with constant overheads

Naive backpropagation through time has a memory footprint that grows lin...
research
03/10/2023

Efficient Real Time Recurrent Learning through combined activity and parameter sparsity

Backpropagation through time (BPTT) is the standard algorithm for traini...
research
05/17/2019

Adaptively Truncating Backpropagation Through Time to Control Gradient Bias

Truncated backpropagation through time (TBPTT) is a popular method for l...
research
07/15/2022

POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging

Fine-tuning models on edge devices like mobile phones would enable priva...
research
06/13/2021

Low-memory stochastic backpropagation with multi-channel randomized trace estimation

Thanks to the combination of state-of-the-art accelerators and highly op...
research
03/12/2020

Optimization of Generalized Jacobian Chain Products without Memory Constraints

The efficient computation of Jacobians represents a fundamental challeng...

Please sign up or login with your details

Forgot password? Click here to reset