Kronecker-factored Quasi-Newton Methods for Convolutional Neural Networks

02/12/2021
by   Yi Ren, et al.
0

Second-order methods have the capability of accelerating optimization by using much richer curvature information than first-order methods. However, most are impractical in a deep learning setting where the number of training parameters is huge. In this paper, we propose KF-QN-CNN, a new Kronecker-factored quasi-Newton method for training convolutional neural networks (CNNs), where the Hessian is approximated by a layer-wise block diagonal matrix and each layer's diagonal block is further approximated by a Kronecker product corresponding to the structure of the Hessian restricted to that layer. New damping and Hessian-action techniques for BFGS are designed to deal with the non-convexity and the particularly large size of Kronecker matrices in CNN models and convergence results are proved for a variant of KF-QN-CNN under relatively mild conditions. KF-QN-CNN has memory requirements comparable to first-order methods and much less per-iteration time complexity than traditional second-order methods. Compared with state-of-the-art first- and second-order methods on several CNN models, KF-QN-CNN consistently exhibited superior performance in all of our tests.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2017

Block-diagonal Hessian-free Optimization for Training Neural Networks

Second-order methods for neural network optimization have several advant...
research
02/19/2018

BDA-PCH: Block-Diagonal Approximation of Positive-Curvature Hessian for Training Neural Networks

We propose a block-diagonal approximation of the positive-curvature Hess...
research
04/11/2019

Modified online Newton step based on element wise multiplication

The second order method as Newton Step is a suitable technique in Online...
research
05/26/2022

Faster Optimization on Sparse Graphs via Neural Reparametrization

In mathematical optimization, second-order Newton's methods generally co...
research
05/23/2023

Layer-wise Adaptive Step-Sizes for Stochastic First-Order Methods for Deep Learning

We propose a new per-layer adaptive step-size procedure for stochastic f...
research
04/29/2020

WoodFisher: Efficient second-order approximations for model compression

Second-order information, in the form of Hessian- or Inverse-Hessian-vec...
research
05/23/2023

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

Given the massive cost of language model pre-training, a non-trivial imp...

Please sign up or login with your details

Forgot password? Click here to reset