Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

03/05/2022
by   Seunghyun Lee, et al.
3

Conventional NAS-based pruning algorithms aim to find the sub-network with the best validation performance. However, validation performance does not successfully represent test performance, i.e., potential performance. Also, although fine-tuning the pruned network to restore the performance drop is an inevitable process, few studies have handled this issue. This paper proposes a novel sub-network search and fine-tuning method that is named Ensemble Knowledge Guidance (EKG). First, we experimentally prove that the fluctuation of the loss landscape is an effective metric to evaluate the potential performance. In order to search a sub-network with the smoothest loss landscape at a low cost, we propose a pseudo-supernet built by an ensemble sub-network knowledge distillation. Next, we propose a novel fine-tuning that re-uses the information of the search phase. We store the interim sub-networks, that is, the by-products of the search phase, and transfer their knowledge into the pruned network. Note that EKG is easy to be plugged-in and computationally efficient. For example, in the case of ResNet-50, about 45 without any performance drop in only 315 GPU hours. The implemented code is available at https://github.com/sseung0703/EKG.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2023

Augmenting Sub-model to Improve Main Model

Image classification has improved with the development of training techn...
research
12/24/2022

Pruning On-the-Fly: A Recoverable Pruning Method without Fine-tuning

Most existing pruning works are resource-intensive, requiring retraining...
research
09/11/2023

Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning

The Mixture of Experts (MoE) is a widely known neural architecture where...
research
07/02/2020

Learn Faster and Forget Slower via Fast and Stable Task Adaptation

Training Deep Neural Networks (DNNs) is still highly time-consuming and ...
research
02/12/2020

Stabilizing Differentiable Architecture Search via Perturbation-based Regularization

Differentiable architecture search (DARTS) is a prevailing NAS solution ...
research
01/20/2021

Non-Parametric Adaptive Network Pruning

Popular network pruning algorithms reduce redundant information by optim...
research
09/06/2023

Unity is Strength: Cross-Task Knowledge Distillation to Improve Code Review Generation

Code review is a fundamental process in software development that plays ...

Please sign up or login with your details

Forgot password? Click here to reset