Advancing Model Pruning via Bi-level Optimization

by   Yihua Zhang, et al.

The deployment constraints in practical applications necessitate the pruning of large-scale deep learning models, i.e., promoting their weight sparsity. As illustrated by the Lottery Ticket Hypothesis (LTH), pruning also has the potential of improving their generalization ability. At the core of LTH, iterative magnitude pruning (IMP) is the predominant pruning method to successfully find 'winning tickets'. Yet, the computation cost of IMP grows prohibitively as the targeted pruning ratio increases. To reduce the computation overhead, various efficient 'one-shot' pruning methods have been developed, but these schemes are usually unable to find winning tickets as good as IMP. This raises the question of how to close the gap between pruning accuracy and pruning efficiency? To tackle it, we pursue the algorithmic advancement of model pruning. Specifically, we formulate the pruning problem from a fresh and novel viewpoint, bi-level optimization (BLO). We show that the BLO interpretation provides a technically-grounded optimization base for an efficient implementation of the pruning-retraining learning paradigm used in IMP. We also show that the proposed bi-level optimization-oriented pruning method (termed BiP) is a special class of BLO problems with a bi-linear problem structure. By leveraging such bi-linearity, we theoretically show that BiP can be solved as easily as first-order optimization, thus inheriting the computation efficiency. Through extensive experiments on both structured and unstructured pruning with 5 model architectures and 4 data sets, we demonstrate that BiP can find better winning tickets than IMP in most cases, and is computationally as efficient as the one-shot pruning schemes, demonstrating 2-7 times speedup over IMP for the same level of model accuracy and sparsity.


page 23

page 25


Exploring the Performance of Pruning Methods in Neural Networks: An Empirical Study of the Lottery Ticket Hypothesis

In this paper, we explore the performance of different pruning methods i...

Distilled Pruning: Using Synthetic Data to Win the Lottery

This work introduces a novel approach to pruning deep learning models by...

On Iterative Neural Network Pruning, Reinitialization, and the Similarity of Masks

We examine how recently documented, fundamental phenomena in deep learni...

Layer-wise Model Pruning based on Mutual Information

The proposed pruning strategy offers merits over weight-based pruning te...

DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator Search

The convolutional neural network has achieved great success in fulfillin...

Towards strong pruning for lottery tickets with non-zero biases

The strong lottery ticket hypothesis holds the promise that pruning rand...

One-shot Network Pruning at Initialization with Discriminative Image Patches

One-shot Network Pruning at Initialization (OPaI) is an effective method...

Please sign up or login with your details

Forgot password? Click here to reset