Net-Trim: Convex Pruning of Deep Neural Networks with Performance Guarantee

by   Alireza Aghasi, et al.

We introduce and analyze a new technique for model reduction for deep neural networks. While large networks are theoretically capable of learning arbitrarily complex models, overfitting and model redundancy negatively affects the prediction accuracy and model variance. Our Net-Trim algorithm prunes (sparsifies) a trained network layer-wise, removing connections at each layer by solving a convex optimization program. This program seeks a sparse set of weights at each layer that keeps the layer inputs and outputs consistent with the originally trained model. The algorithms and associated analysis are applicable to neural networks operating with the rectified linear unit (ReLU) as the nonlinear activation. We present both parallel and cascade versions of the algorithm. While the latter can achieve slightly simpler models with the same generalization performance, the former can be computed in a distributed manner. In both cases, Net-Trim significantly reduces the number of connections in the network, while also providing enough regularization to slightly reduce the generalization error. We also provide a mathematical analysis of the consistency between the initial network and the retrained model. To analyze the model sample complexity, we derive the general sufficient conditions for the recovery of a sparse transform matrix. For a single layer taking independent Gaussian random vectors of length N as inputs, we show that if the network response can be described using a maximum number of s non-zero weights per node, these weights can be learned from O(s N) samples.


page 3

page 19

page 24


Fast Convex Pruning of Deep Neural Networks

We develop a fast, tractable technique called Net-Trim for simplifying a...

Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks

The lottery ticket hypothesis (LTH) states that learning on a properly p...

Sparsity-depth Tradeoff in Infinitely Wide Deep Neural Networks

We investigate how sparse neural activity affects the generalization per...

Overparameterized ReLU Neural Networks Learn the Simplest Models: Neural Isometry and Exact Recovery

The practice of deep learning has shown that neural networks generalize ...

Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks

Due to the significant computational challenge of training large-scale g...

On the Expressive Power of Deep Neural Networks

We propose a new approach to the problem of neural network expressivity,...

Progressive Learning for Systematic Design of Large Neural Networks

We develop an algorithm for systematic design of a large artificial neur...

Please sign up or login with your details

Forgot password? Click here to reset