Global minimizers, strict and non-strict saddle points, and implicit regularization for deep linear neural networks

by   El Mehdi Achour, et al.

In non-convex settings, it is established that the behavior of gradient-based algorithms is different in the vicinity of local structures of the objective function such as strict and non-strict saddle points, local and global minima and maxima. It is therefore crucial to describe the landscape of non-convex problems. That is, to describe as well as possible the set of points of each of the above categories. In this work, we study the landscape of the empirical risk associated with deep linear neural networks and the square loss. It is known that, under weak assumptions, this objective function has no spurious local minima and no local maxima. We go a step further and characterize, among all critical points, which are global minimizers, strict saddle points, and non-strict saddle points. We enumerate all the associated critical values. The characterization is simple, involves conditions on the ranks of partial matrix products, and sheds some light on global convergence or implicit regularization that have been proved or observed when optimizing a linear neural network. In passing, we also provide an explicit parameterization of the set of all global minimizers and exhibit large sets of strict and non-strict saddle points.


On the global convergence of randomized coordinate gradient descent for non-convex optimization

In this work, we analyze the global convergence property of coordinate g...

A Critical View of Global Optimality in Deep Learning

We investigate the loss surface of deep linear and nonlinear neural netw...

Local and Global Convergence of General Burer-Monteiro Tensor Optimizations

Tensor optimization is crucial to massive machine learning and signal pr...

Convergence rates for critical point regularization

Tikhonov regularization involves minimizing the combination of a data di...

Numerically Recovering the Critical Points of a Deep Linear Autoencoder

Numerically locating the critical points of non-convex surfaces is a lon...

Optimal Sets and Solution Paths of ReLU Networks

We develop an analytical framework to characterize the set of optimal Re...

Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval

Despite the widespread use of gradient-based algorithms for optimizing h...

Please sign up or login with your details

Forgot password? Click here to reset