Randomized Automatic Differentiation

07/20/2020
by   Deniz Oktay, et al.
38

The successes of deep learning, variational inference, and many other fields have been aided by specialized implementations of reverse-mode automatic differentiation (AD) to compute gradients of mega-dimensional objectives. The AD techniques underlying these tools were designed to compute exact gradients to numerical precision, but modern machine learning models are almost always trained with stochastic gradient descent. Why spend computation and memory on exact (minibatch) gradients only to use them for stochastic optimization? We develop a general framework and approach for randomized automatic differentiation (RAD), which allows unbiased gradient estimates to be computed with reduced memory in return for variance. We examine limitations of the general approach, and argue that we must leverage problem specific structure to realize benefits. We develop RAD techniques for a variety of simple neural network architectures, and show that for a fixed memory budget, RAD converges in fewer iterations than using a small batch size for feedforward networks, and in a similar number for recurrent networks. We also show that RAD can be applied to scientific computing, and use it to develop a low-memory stochastic gradient method for optimizing the control parameters of a linear reaction-diffusion PDE representing a fission reactor.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2021

Storchastic: A Framework for General Stochastic Automatic Differentiation

Modelers use automatic differentiation of computation graphs to implemen...
research
02/17/2022

Gradients without Backpropagation

Using backpropagation to compute gradients of objective functions for op...
research
07/09/2019

SVGD: A Virtual Gradients Descent Method for Stochastic Optimization

Inspired by dynamic programming, we propose Stochastic Virtual Gradient ...
research
03/27/2018

Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagator

Deep learning has seen tremendous success over the past decade in comput...
research
12/13/2022

ADEV: Sound Automatic Differentiation of Expected Values of Probabilistic Programs

Optimizing the expected values of probabilistic processes is a central p...
research
06/20/2016

Benchmarking Python Tools for Automatic Differentiation

In this paper we compare several Python tools for automatic differentiat...
research
10/16/2022

Automatic Differentiation of Programs with Discrete Randomness

Automatic differentiation (AD), a technique for constructing new program...

Please sign up or login with your details

Forgot password? Click here to reset