ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs

02/27/2019
by   Amir Gholami, et al.
0

Residual neural networks can be viewed as the forward Euler discretization of an Ordinary Differential Equation (ODE) with a unit time step. This has recently motivated researchers to explore other discretization approaches and train ODE based networks. However, an important challenge of neural ODEs is their prohibitive memory cost during gradient backpropogation. Recently a method proposed in +arXiv:1806.07366+, claimed that this memory overhead can be reduced from O(LN_t), where N_t is the number of time steps, down to O(L) by solving forward ODE backwards in time, where L is the depth of the network. However, we will show that this approach may lead to several problems: (i) it may be numerically unstable for ReLU/non-ReLU activations and general convolution operators, and (ii) the proposed optimize-then-discretize approach may lead to divergent training due to inconsistent gradients for small time step sizes. We discuss the underlying problems, and to address them we propose , a neural ODE framework which avoids the numerical instability related problems noted above. has a memory footprint of O(L) + O(N_t), with the same computational cost as reversing ODE solve. We furthermore, discuss a memory efficient algorithm which can further reduce this footprint with a tradeoff of additional computational cost. We show results on Cifar-10/100 datasets using ResNet and SqueezeNext neural networks.

READ FULL TEXT
research
09/20/2017

Survey on Semi-Explicit Time Integration of Eddy Current Problems

The spatial discretization of the magnetic vector potential formulation ...
research
06/10/2019

ANODEV2: A Coupled Neural ODE Evolution Framework

It has been observed that residual networks can be viewed as the explici...
research
08/01/2020

Neural ODE with Temporal Convolution and Time Delay Neural Networks for Small-Footprint Keyword Spotting

In this paper, we propose neural network models based on the neural ordi...
research
08/01/2019

Accelerating CNN Training by Sparsifying Activation Gradients

Gradients to activations get involved in most of the calculations during...
research
01/14/2017

Long Timescale Credit Assignment in NeuralNetworks with External Memory

Credit assignment in traditional recurrent neural networks usually invol...
research
02/19/2021

Symplectic Adjoint Method for Exact Gradient of Neural ODE with Minimal Memory

A neural network model of a differential equation, namely neural ODE, ha...
research
02/26/2021

Sparse approximation in learning via neural ODEs

We consider the continuous-time, neural ordinary differential equation (...

Please sign up or login with your details

Forgot password? Click here to reset