Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks

by   Alexander Shevchenko, et al.

The optimization of multilayer neural networks typically leads to a solution with zero training error, yet the landscape can exhibit spurious local minima and the minima can be disconnected. In this paper, we shed light on this phenomenon: we show that the combination of stochastic gradient descent (SGD) and over-parameterization makes the landscape of multilayer neural networks approximately connected and thus more favorable to optimization. More specifically, we prove that SGD solutions are connected via a piecewise linear path, and the increase in loss along this path vanishes as the number of neurons grows large. This result is a consequence of the fact that the parameters found by SGD are increasingly dropout stable as the network becomes wider. We show that, if we remove part of the neurons (and suitably rescale the remaining ones), the change in loss is independent of the total number of neurons, and it depends only on how many neurons are left. Our results exhibit a mild dependence on the input dimension: they are dimension-free for two-layer networks and depend linearly on the dimension for multilayer networks. We validate our theoretical findings with numerical experiments for different architectures and classification tasks.


page 1

page 2

page 3

page 4


No bad local minima: Data independent training error guarantees for multilayer neural networks

We use smoothed analysis techniques to provide guarantees on the trainin...

On Connectivity of Solutions in Deep Learning: The Role of Over-parameterization and Feature Quality

It has been empirically observed that, in deep neural networks, the solu...

When Expressivity Meets Trainability: Fewer than n Neurons Can Work

Modern neural networks are often quite wide, causing large memory and co...

Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets

Mode connectivity is a surprising phenomenon in the loss landscape of de...

Mean Field Limit of the Learning Dynamics of Multilayer Neural Networks

Can multilayer neural networks -- typically constructed as highly comple...

Engineering Monosemanticity in Toy Models

In some neural networks, individual neurons correspond to natural “featu...

Multilayer Lookahead: a Nested Version of Lookahead

In recent years, SGD and its variants have become the standard tool to t...

Please sign up or login with your details

Forgot password? Click here to reset