Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape

by   Johanni Brea, et al.

The permutation symmetry of neurons in each layer of a deep neural network gives rise not only to multiple equivalent global minima of the loss function, but also to first-order saddle points located on the path between the global minima. In a network of d-1 hidden layers with n_k neurons in layers k = 1, ..., d, we construct smooth paths between equivalent global minima that lead through a `permutation point' where the input and output weight vectors of two neurons in the same hidden layer k collide and interchange. We show that such permutation points are critical points with at least n_k+1 vanishing eigenvalues of the Hessian matrix of second derivatives indicating a local plateau of the loss function. We find that a permutation point for the exchange of neurons i and j transits into a flat valley (or generally, an extended plateau of n_k+1 flat dimensions) that enables all n_k! permutations of neurons in a given layer k at the same loss value. Moreover, we introduce high-order permutation points by exploiting the recursive structure in neural network functions, and find that the number of K^th-order permutation points is at least by a factor ∑_k=1^d-11/2!^Kn_k-K K larger than the (already huge) number of equivalent global minima. In two tasks, we illustrate numerically that some of the permutation points correspond to first-order saddles (`permutation saddles'): first, in a toy network with a single hidden layer on a function approximation task and, second, in a multilayer network on the MNIST task. Our geometric approach yields a lower bound on the number of critical points generated by weight-space symmetries and provides a simple intuitive link between previous mathematical results and numerical observations.


page 1

page 2

page 3

page 4


On the High Symmetry of Neural Network Functions

Training neural networks means solving a high-dimensional optimization p...

Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances

We study how permutation symmetries in overparameterized multi-layer neu...

Complex Critical Points of Deep Linear Neural Networks

We extend the work of Mehta, Chen, Tang, and Hauenstein on computing the...

Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural Networks

We study the optimization problem associated with fitting two-layer ReLU...

On the emergence of tetrahedral symmetry in the final and penultimate layers of neural network classifiers

A recent numerical study observed that neural network classifiers enjoy ...

Engineering Monosemanticity in Toy Models

In some neural networks, individual neurons correspond to natural “featu...

Semi-flat minima and saddle points by embedding neural networks to overparameterization

We theoretically study the landscape of the training error for neural ne...

Please sign up or login with your details

Forgot password? Click here to reset