Coordinate descent on the orthogonal group for recurrent neural network training

07/30/2021
by   Estelle Massart, et al.
0

We propose to use stochastic Riemannian coordinate descent on the orthogonal group for recurrent neural network training. The algorithm rotates successively two columns of the recurrent matrix, an operation that can be efficiently implemented as a multiplication by a Givens matrix. In the case when the coordinate is selected uniformly at random at each iteration, we prove the convergence of the proposed algorithm under standard assumptions on the loss function, stepsize and minibatch noise. In addition, we numerically demonstrate that the Riemannian gradient in recurrent neural network training has an approximately sparse structure. Leveraging this observation, we propose a faster variant of the proposed algorithm that relies on the Gauss-Southwell rule. Experiments on a benchmark recurrent neural network training problem are presented to demonstrate the effectiveness of the proposed algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/18/2017

Accelerating recurrent neural network training using sequence bucketing and multi-GPU data parallelization

An efficient algorithm for recurrent neural network training is presente...
research
06/14/2016

Recurrent neural network training with preconditioned stochastic gradient descent

This paper studies the performance of a recently proposed preconditioned...
research
11/04/2021

Recurrent Neural Network Training with Convex Loss and Regularization Functions by Extended Kalman Filtering

We investigate the use of extended Kalman filtering to train recurrent n...
research
11/13/2020

A Homotopy Coordinate Descent Optimization Method for l_0-Norm Regularized Least Square Problem

This paper proposes a homotopy coordinate descent (HCD) method to solve ...
research
06/09/2019

Stochastic In-Face Frank-Wolfe Methods for Non-Convex Optimization and Sparse Neural Network Training

The Frank-Wolfe method and its extensions are well-suited for delivering...
research
07/17/2016

Learning Unitary Operators with Help From u(n)

A major challenge in the training of recurrent neural networks is the so...
research
02/23/2021

Histo-fetch – On-the-fly processing of gigapixel whole slide images simplifies and speeds neural network training

We created a custom pipeline (histo-fetch) to efficiently extract random...

Please sign up or login with your details

Forgot password? Click here to reset