Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks

by   Shuai Zhang, et al.
Rensselaer Polytechnic Institute
Michigan State University
University at Buffalo

The lottery ticket hypothesis (LTH) states that learning on a properly pruned network (the winning ticket) improves test accuracy over the original unpruned network. Although LTH has been justified empirically in a broad range of deep neural network (DNN) involved applications like computer vision and natural language processing, the theoretical validation of the improved generalization of a winning ticket remains elusive. To the best of our knowledge, our work, for the first time, characterizes the performance of training a pruned neural network by analyzing the geometric structure of the objective function and the sample complexity to achieve zero generalization error. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned, indicating the structural importance of a winning ticket. Moreover, when the algorithm for training a pruned neural network is specified as an (accelerated) stochastic gradient descent algorithm, we theoretically show that the number of samples required for achieving zero generalization error is proportional to the number of the non-pruned weights in the hidden layer. With a fixed number of samples, training a pruned neural network enjoys a faster convergence rate to the desired model than training the original unpruned one, providing a formal justification of the improved generalization of the winning ticket. Our theoretical results are acquired from learning a pruned neural network of one hidden layer, while experimental results are further provided to justify the implications in pruning multi-layer neural networks.


Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks

Due to the significant computational challenge of training large-scale g...

Learning and generalization of one-hidden-layer neural networks, going beyond standard Gaussian data

This paper analyzes the convergence and generalization of training a one...

Net-Trim: Convex Pruning of Deep Neural Networks with Performance Guarantee

We introduce and analyze a new technique for model reduction for deep ne...

Fast Convex Pruning of Deep Neural Networks

We develop a fast, tractable technique called Net-Trim for simplifying a...

Learning the mapping x∑_i=1^d x_i^2: the cost of finding the needle in a haystack

The task of using machine learning to approximate the mapping x∑_i=1^d x...

ResMem: Learn what you can and memorize the rest

The impressive generalization performance of modern neural networks is a...

Geometry Perspective Of Estimating Learning Capability Of Neural Networks

The paper uses statistical and differential geometric motivation to acqu...

Please sign up or login with your details

Forgot password? Click here to reset