Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation

07/09/2021
by   Arnulf Jentzen, et al.
0

Gradient descent (GD) type optimization schemes are the standard methods to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Such schemes can be considered as discretizations of gradient flows (GFs) associated to the training of ANNs with ReLU activation and most of the key difficulties in the mathematical convergence analysis of GD type optimization schemes in the training of ANNs with ReLU activation seem to be already present in the dynamics of the corresponding GF differential equations. It is the key subject of this work to analyze such GF differential equations in the training of ANNs with ReLU activation and three layers (one input layer, one hidden layer, and one output layer). In particular, in this article we prove in the case where the target function is possibly multi-dimensional and continuous and in the case where the probability distribution of the input data is absolutely continuous with respect to the Lebesgue measure that the risk of every bounded GF trajectory converges to the risk of a critical point. In addition, in this article we show in the case of a 1-dimensional affine linear target function and in the case where the probability distribution of the input data coincides with the standard uniform distribution that the risk of every bounded GF trajectory converges to zero if the initial risk is sufficiently small. Finally, in the special situation where there is only one neuron on the hidden layer (1-dimensional hidden layer) we strengthen the above named result for affine linear target functions by proving that that the risk of every (not necessarily bounded) GF trajectory converges to zero if the initial risk is sufficiently small.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/18/2021

Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation

The training of artificial neural networks (ANNs) with rectified linear ...
research
02/28/2023

On the existence of minimizers in shallow residual ReLU neural network optimization landscapes

Many mathematical convergence results for gradient descent (GD) based al...
research
12/17/2021

On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks

In this article we study fully-connected feedforward deep ReLU ANNs with...
research
02/10/2020

Deep Gated Networks: A framework to understand training and generalisation in deep learning

Understanding the role of (stochastic) gradient descent in training and ...
research
07/13/2022

Normalized gradient flow optimization in the training of ReLU artificial neural networks

The training of artificial neural networks (ANNs) is nowadays a highly r...

Please sign up or login with your details

Forgot password? Click here to reset