Limitations of the Empirical Fisher Approximation

05/29/2019
by   Frederik Kunstner, et al.
0

Natural gradient descent, which preconditions a gradient descent update with the Fisher information matrix of the underlying statistical model, is a way to capture partial second-order information. Several highly visible works have advocated an approximation known as the empirical Fisher, drawing connections between approximate second-order methods and heuristics like Adam. We dispute this argument by showing that the empirical Fisher---unlike the Fisher---does not generally capture second-order information. We further argue that the conditions under which the empirical Fisher approaches the Fisher (and the Hessian) are unlikely to be met in practice, and that, even on simple optimization problems, the pathologies of the empirical Fisher can have undesirable effects.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/03/2014

New insights and perspectives on the natural gradient method

Natural gradient descent is an optimization method traditionally motivat...
research
06/18/2020

When Does Preconditioning Help or Hurt Generalization?

While second order optimizers such as natural gradient descent (NGD) oft...
research
07/09/2021

On the Variance of the Fisher Information for Deep Learning

The Fisher information matrix (FIM) has been applied to the realm of dee...
research
10/02/2020

Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks

Natural Gradient Descent (NGD) helps to accelerate the convergence of gr...
research
09/26/2018

Learning Preconditioners on Lie Groups

We study two types of preconditioners and preconditioned stochastic grad...
research
10/18/2019

First-Order Preconditioning via Hypergradient Descent

Standard gradient descent methods are susceptible to a range of issues t...

Please sign up or login with your details

Forgot password? Click here to reset