Revisiting Natural Gradient for Deep Networks

01/16/2013
by   Razvan Pascanu, et al.
0

We evaluate natural gradient, an algorithm originally proposed in Amari (1997), for learning deep models. The contributions of this paper are as follows. We show the connection between natural gradient and three other recently proposed methods for training deep models: Hessian-Free (Martens, 2010), Krylov Subspace Descent (Vinyals and Povey, 2012) and TONGA (Le Roux et al., 2008). We describe how one can use unlabeled data to improve the generalization error obtained by natural gradient and empirically evaluate the robustness of the algorithm to the ordering of the training set compared to stochastic gradient descent. Finally we extend natural gradient to incorporate second order information alongside the manifold information and provide a benchmark of the new algorithm using a truncated Newton approach for inverting the metric matrix instead of using a diagonal approximation of it.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/22/2021

Structured second-order methods via natural gradient descent

In this paper, we propose new structured second-order methods and struct...
research
11/18/2011

Krylov Subspace Descent for Deep Learning

In this paper, we propose a second order optimization method to learn mo...
research
10/18/2018

First-order and second-order variants of the gradient descent: a unified framework

In this paper, we provide an overview of first-order and second-order va...
research
11/27/2020

Eigenvalue-corrected Natural Gradient Based on a New Approximation

Using second-order optimization methods for training deep neural network...
research
01/16/2013

Metric-Free Natural Gradient for Joint-Training of Boltzmann Machines

This paper introduces the Metric-Free Natural Gradient (MFNG) algorithm ...
research
03/17/2018

Constrained Deep Learning using Conditional Gradient and Applications in Computer Vision

A number of results have recently demonstrated the benefits of incorpora...
research
06/25/2021

Assessing Generalization of SGD via Disagreement

We empirically show that the test error of deep networks can be estimate...

Please sign up or login with your details

Forgot password? Click here to reset