With Greater Distance Comes Worse Performance: On the Perspective of Layer Utilization and Model Generalization

01/28/2022
by   James Wang, et al.
0

Generalization of deep neural networks remains one of the main open problems in machine learning. Previous theoretical works focused on deriving tight bounds of model complexity, while empirical works revealed that neural networks exhibit double descent with respect to both training sample counts and the neural network size. In this paper, we empirically examined how different layers of neural networks contribute differently to the model; we found that early layers generally learn representations relevant to performance on both training data and testing data. Contrarily, deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data. We further illustrate the distance of trained weights to its initial value of final layers has high correlation to generalization errors and can serve as an indicator of an overfit of model. Moreover, we show evidence to support post-training regularization by re-initializing weights of final layers. Our findings provide an efficient method to estimate the generalization capability of neural networks, and the insight of those quantitative results may inspire derivation to better generalization bounds that take the internal structure of neural networks into consideration.

READ FULL TEXT

page 4

page 5

research
05/30/2021

On the geometry of generalization and memorization in deep neural networks

Understanding how large neural networks avoid memorizing training data i...
research
05/31/2023

The Tunnel Effect: Building Data Representations in Deep Neural Networks

Deep neural networks are widely known for their remarkable effectiveness...
research
06/28/2023

On information captured by neural networks: connections with memorization and generalization

Despite the popularity and success of deep learning, there is limited un...
research
02/06/2019

Are All Layers Created Equal?

Understanding learning and generalization of deep architectures has been...
research
02/12/2020

Topologically Densified Distributions

We study regularization in the context of small sample-size learning wit...
research
11/23/2022

Relating Regularization and Generalization through the Intrinsic Dimension of Activations

Given a pair of models with similar training set performance, it is natu...
research
11/17/2019

Encouraging an Appropriate Representation Simplifies Training of Neural Networks

A common assumption about neural networks is that they can learn an appr...

Please sign up or login with your details

Forgot password? Click here to reset