Is the Skip Connection Provable to Reform the Neural Network Loss Landscape?

by   Lifu Wang, et al.

The residual network is now one of the most effective structures in deep learning, which utilizes the skip connections to “guarantee" the performance will not get worse. However, the non-convexity of the neural network makes it unclear whether the skip connections do provably improve the learning ability since the nonlinearity may create many local minima. In some previous works <cit.>, it is shown that despite the non-convexity, the loss landscape of the two-layer ReLU network has good properties when the number m of hidden nodes is very large. In this paper, we follow this line to study the topology (sub-level sets) of the loss landscape of deep ReLU neural networks with a skip connection and theoretically prove that the skip connection network inherits the good properties of the two-layer network and skip connections can help to control the connectedness of the sub-level sets, such that any local minima worse than the global minima of some two-layer ReLU network will be very “shallow". The “depth" of these local minima are at most O(m^(η-1)/n), where n is the input dimension, η<1. This provides a theoretical explanation for the effectiveness of the skip connection in deep learning.


Adding One Neuron Can Eliminate All Bad Local Minima

One of the main difficulties in analyzing neural networks is the non-con...

Universal flow approximation with deep residual networks

Residual networks (ResNets) are a deep learning architecture with the re...

Skip Connections Eliminate Singularities

Skip connections made the training of very deep networks possible and ha...

Rethinking Skip Connection with Layer Normalization in Transformers and ResNets

Skip connection, is a widely-used technique to improve the performance a...

The layer-wise L1 Loss Landscape of Neural Nets is more complex around local minima

For fixed training data and network parameters in the other layers the L...

SML:Enhance the Network Smoothness with Skip Meta Logit for CTR Prediction

In light of the smoothness property brought by skip connections in ResNe...

A global analysis of global optimisation

Theoretical understanding of the training of deep neural networks has ma...

Please sign up or login with your details

Forgot password? Click here to reset