Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances

by   Berfin Simsek, et al.

We study how permutation symmetries in overparameterized multi-layer neural networks generate `symmetry-induced' critical points. Assuming a network with L layers of minimal widths r_1^*, …, r_L-1^* reaches a zero-loss minimum at r_1^*! ⋯ r_L-1^*! isolated points that are permutations of one another, we show that adding one extra neuron to each layer is sufficient to connect all these previously discrete minima into a single manifold. For a two-layer overparameterized network of width r^*+ h =: m we explicitly describe the manifold of global minima: it consists of T(r^*, m) affine subspaces of dimension at least h that are connected to one another. For a network of width m, we identify the number G(r,m) of affine subspaces containing only symmetry-induced critical points that are related to the critical points of a smaller network of width r<r^*. Via a combinatorial analysis, we derive closed-form formulas for T and G and show that the number of symmetry-induced critical subspaces dominates the number of affine subspaces forming the global minima manifold in the mildly overparameterized regime (small h) and vice versa in the vastly overparameterized regime (h ≫ r^*). Our results provide new insights into the minimization of the non-convex loss function of overparameterized neural networks.


page 1

page 2

page 3

page 4


Interpolation property of shallow neural networks

We study the geometry of global minima of the loss landscape of overpara...

Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape

The permutation symmetry of neurons in each layer of a deep neural netwo...

Landscape analysis for shallow ReLU neural networks: complete classification of critical points for affine target functions

In this paper, we analyze the landscape of the true loss of a ReLU neura...

Symmetry critical points for a model shallow neural network

A detailed analysis is given of a family of critical points determining ...

On the High Symmetry of Neural Network Functions

Training neural networks means solving a high-dimensional optimization p...

The Loss Surfaces of Multilayer Networks

We study the connection between the highly non-convex loss function of a...

Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks

Unraveling the general structure underlying the loss landscapes of deep ...

Please sign up or login with your details

Forgot password? Click here to reset