Combining Natural Gradient with Hessian Free Methods for Sequence Training

10/03/2018
by   Adnan Haider, et al.
0

This paper presents a new optimisation approach to train Deep Neural Networks (DNNs) with discriminative sequence criteria. At each iteration, the method combines information from the Natural Gradient (NG) direction with local curvature information of the error surface that enables better paths on the parameter manifold to be traversed. The method is derived using an alternative derivation of Taylor's theorem using the concepts of manifolds, tangent vectors and directional derivatives from the perspective of Information Geometry. The efficacy of the method is shown within a Hessian Free (HF) style optimisation framework to sequence train both standard fully-connected DNNs and Time Delay Neural Networks as speech recognition acoustic models. It is shown that for the same number of updates the proposed approach achieves larger reductions in the word error rate (WER) than both NG and HF, and also leads to a lower WER than standard stochastic gradient descent. The paper also addresses the issue of over-fitting due to mismatch between training criterion and Word Error Rate (WER) that primarily arises during sequence training of ReLU-DNN models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2018

Sequence Training of DNN Acoustic Models With Natural Gradient

Deep Neural Network (DNN) acoustic models often use discriminative seque...
research
03/12/2021

A Distributed Optimisation Framework Combining Natural Gradient with Hessian-Free for Discriminative Sequence Training

This paper presents a novel natural gradient and Hessian-free (NGHF) opt...
research
06/19/2016

Graph based manifold regularized deep neural networks for automatic speech recognition

Deep neural networks (DNNs) have been successfully applied to a wide var...
research
10/01/2019

The asymptotic spectrum of the Hessian of DNN throughout training

The dynamics of DNNs during gradient descent is described by the so-call...
research
06/08/2017

Optimizing expected word error rate via sampling for speech recognition

State-level minimum Bayes risk (sMBR) training has become the de facto s...
research
03/26/2018

A Common Framework for Natural Gradient and Taylor based Optimisation using Manifold Theory

This technical report constructs a theoretical framework to relate stand...
research
07/17/2020

Neural Architecture Search for Speech Recognition

Deep neural networks (DNNs) based automatic speech recognition (ASR) sys...

Please sign up or login with your details

Forgot password? Click here to reset