The loss surface of deep and wide neural networks

04/26/2017
by   Quynh Nguyen, et al.
0

While the optimization problem behind deep neural networks is highly non-convex, it is frequently observed in practice that training deep networks seems possible without getting stuck in suboptimal points. It has been argued that this is the case as all local minima are close to being globally optimal. We show that this is (almost) true, in fact almost all local minima are globally optimal, for a fully connected network with squared loss and analytic activation function given that the number of hidden units of one layer of the network is larger than the number of training points and the network structure from this layer on is pyramidal.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2018

Non-attracting Regions of Local Minima in Deep and Wide Neural Networks

Understanding the loss surface of neural networks is essential for the d...
research
11/09/2016

Diverse Neural Network Learns True Target Functions

Neural networks are a powerful class of functions that can be trained wi...
research
12/28/2018

Over-Parameterized Deep Neural Networks Have No Strict Local Minima For Any Continuous Activations

In this paper, we study the loss surface of the over-parameterized fully...
research
10/28/2016

Globally Optimal Training of Generalized Polynomial Neural Networks with Nonlinear Spectral Methods

The optimization problem behind neural networks is highly non-convex. Tr...
research
01/04/2007

Statistical tools to assess the reliability of self-organizing maps

Results of neural network learning are always subject to some variabilit...
research
04/06/2018

The Loss Surface of XOR Artificial Neural Networks

Training an artificial neural network involves an optimization process o...
research
06/06/2021

On the Power of Shallow Learning

A deluge of recent work has explored equivalences between wide neural ne...

Please sign up or login with your details

Forgot password? Click here to reset