Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

03/05/2022

∙

Conventional NAS-based pruning algorithms aim to find the sub-network with the best validation performance. However, validation performance does not successfully represent test performance, i.e., potential performance. Also, although fine-tuning the pruned network to restore the performance drop is an inevitable process, few studies have handled this issue. This paper proposes a novel sub-network search and fine-tuning method that is named Ensemble Knowledge Guidance (EKG). First, we experimentally prove that the fluctuation of the loss landscape is an effective metric to evaluate the potential performance. In order to search a sub-network with the smoothest loss landscape at a low cost, we propose a pseudo-supernet built by an ensemble sub-network knowledge distillation. Next, we propose a novel fine-tuning that re-uses the information of the search phase. We store the interim sub-networks, that is, the by-products of the search phase, and transfer their knowledge into the pruned network. Note that EKG is easy to be plugged-in and computationally efficient. For example, in the case of ResNet-50, about 45 without any performance drop in only 315 GPU hours. The implemented code is available at https://github.com/sseung0703/EKG.

READ FULL TEXT

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

Augmenting Sub-model to Improve Main Model

Pruning On-the-Fly: A Recoverable Pruning Method without Fine-tuning

Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning

Learn Faster and Forget Slower via Fast and Stable Task Adaptation

Stabilizing Differentiable Architecture Search via Perturbation-based Regularization

Non-Parametric Adaptive Network Pruning

Unity is Strength: Cross-Task Knowledge Distillation to Improve Code Review Generation

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

Related Research

Augmenting Sub-model to Improve Main Model

Pruning On-the-Fly: A Recoverable Pruning Method without Fine-tuning

Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning

Learn Faster and Forget Slower via Fast and Stable Task Adaptation

Stabilizing Differentiable Architecture Search via Perturbation-based Regularization

Non-Parametric Adaptive Network Pruning

Unity is Strength: Cross-Task Knowledge Distillation to Improve Code Review Generation