BDA-PCH: Block-Diagonal Approximation of Positive-Curvature Hessian for Training Neural Networks

02/19/2018
by   Sheng-Wei Chen, et al.
0

We propose a block-diagonal approximation of the positive-curvature Hessian (BDA-PCH) matrix to measure curvature. Our proposed BDAPCH matrix is memory efficient and can be applied to any fully-connected neural networks where the activation and criterion functions are twice differentiable. Particularly, our BDA-PCH matrix can handle non-convex criterion functions. We devise an efficient scheme utilizing the conjugate gradient method to derive Newton directions for mini-batch setting. Empirical studies show that our method outperforms the competing second-order methods in convergence speed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2017

Block-diagonal Hessian-free Optimization for Training Neural Networks

Second-order methods for neural network optimization have several advant...
research
02/12/2021

Kronecker-factored Quasi-Newton Methods for Convolutional Neural Networks

Second-order methods have the capability of accelerating optimization by...
research
06/27/2012

Estimating the Hessian by Back-propagating Curvature

In this work we develop Curvature Propagation (CP), a general technique ...
research
01/14/2020

On the Convex Behavior of Deep Neural Networks in Relation to the Layers' Width

The Hessian of neural networks can be decomposed into a sum of two matri...
research
05/25/2017

Diagonal Rescaling For Neural Networks

We define a second-order neural network stochastic gradient training alg...
research
06/04/2021

ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure

Curvature in form of the Hessian or its generalized Gauss-Newton (GGN) a...
research
05/29/2023

SANE: The phases of gradient descent through Sharpness Adjusted Number of Effective parameters

Modern neural networks are undeniably successful. Numerous studies have ...

Please sign up or login with your details

Forgot password? Click here to reset