Overfreezing Meets Overparameterization: A Double Descent Perspective on Transfer Learning of Deep Neural Networks

by   Yehuda Dar, et al.

We study the generalization behavior of transfer learning of deep neural networks (DNNs). We adopt the overparameterization perspective – featuring interpolation of the training data (i.e., approximately zero train error) and the double descent phenomenon – to explain the delicate effect of the transfer learning setting on generalization performance. We study how the generalization behavior of transfer learning is affected by the dataset size in the source and target tasks, the number of transferred layers that are kept frozen in the target DNN training, and the similarity between the source and target tasks. We show that the test error evolution during the target DNN training has a more significant double descent effect when the target training dataset is sufficiently large with some label noise. In addition, a larger source training dataset can delay the arrival to interpolation and double descent peak in the target DNN training. Moreover, we demonstrate that the number of frozen layers can determine whether the transfer learning is effectively underparameterized or overparameterized and, in turn, this may affect the relative success or failure of learning. Specifically, we show that too many frozen layers may make a transfer from a less related source task better or on par with a transfer from a more related source task; we call this case overfreezing. We establish our results using image classification experiments with the residual network (ResNet) and vision transformer (ViT) architectures.


page 2

page 3

page 8

page 19


Double Double Descent: On Generalization Errors in Transfer Learning between Linear Regression Tasks

We study the transfer learning process between two linear regression pro...

An analytic theory of generalization dynamics and transfer learning in deep linear networks

Much attention has been devoted recently to the generalization puzzle in...

Transfer Learning Can Outperform the True Prior in Double Descent Regularization

We study a fundamental transfer learning process from source to target l...

Simulated Annealing in Early Layers Leads to Better Generalization

Recently, a number of iterative learning methods have been introduced to...

Auto-Transfer: Learning to Route Transferrable Representations

Knowledge transfer between heterogeneous source and target networks and ...

Transfer Learning in General Lensless Imaging through Scattering Media

Recently deep neural networks (DNNs) have been successfully introduced t...

A Practitioners' Guide to Transfer Learning for Text Classification using Convolutional Neural Networks

Transfer Learning (TL) plays a crucial role when a given dataset has ins...

Please sign up or login with your details

Forgot password? Click here to reset