Distributed Newton Methods for Deep Neural Networks

02/01/2018
by   Chien-Chih Wang, et al.
0

Deep learning involves a difficult non-convex optimization problem with a large number of weights between any two adjacent layers of a deep structure. To handle large data sets or complicated networks, distributed training is needed, but the calculation of function, gradient, and Hessian is expensive. In particular, the communication and the synchronization cost may become a bottleneck. In this paper, we focus on situations where the model is distributedly stored, and propose a novel distributed Newton method for training deep neural networks. By variable and feature-wise data partitions, and some careful designs, we are able to explicitly use the Jacobian matrix for matrix-vector products in the Newton method. Some techniques are incorporated to reduce the running time as well as the memory consumption. First, to reduce the communication cost, we propose a diagonalization method such that an approximate Newton direction can be obtained without communication between machines. Second, we consider subsampled Gauss-Newton matrices for reducing the running time as well as the communication cost. Third, to reduce the synchronization cost, we terminate the process of finding an approximate Newton direction even though some nodes have not finished their tasks. Details of some implementation issues in distributed environments are thoroughly investigated. Experiments demonstrate that the proposed method is effective for the distributed training of deep neural networks. In compared with stochastic gradient methods, it is more robust and may give better test accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/14/2018

Newton Methods for Convolutional Neural Networks

Deep learning involves a difficult non-convex optimization problem, whic...
research
06/05/2019

Efficient Subsampled Gauss-Newton and Natural Gradient Methods for Training Neural Networks

We present practical Levenberg-Marquardt variants of Gauss-Newton and na...
research
12/02/2021

Newton methods based convolution neural networks using parallel processing

Training of convolutional neural networks is a high dimensional and a no...
research
06/16/2020

Practical Quasi-Newton Methods for Training Deep Neural Networks

We consider the development of practical stochastic quasi-Newton, and in...
research
04/06/2022

A Hessian inversion-free exact second order method for distributed consensus optimization

We consider a standard distributed consensus optimization problem where ...
research
10/26/2020

An Efficient Newton Method for Extreme Similarity Learning with Nonlinear Embeddings

We study the problem of learning similarity by using nonlinear embedding...
research
07/22/2019

Practical Newton-Type Distributed Learning using Gradient Based Approximations

We study distributed algorithms for expected loss minimization where the...

Please sign up or login with your details

Forgot password? Click here to reset