Joint Channel and Weight Pruning for Model Acceleration on Moblie Devices

by   Tianli Zhao, et al.

For practical deep neural network design on mobile devices, it is essential to consider the constraints incurred by the computational resources and the inference latency in various applications. Among deep network acceleration related approaches, pruning is a widely adopted practice to balance the computational resource consumption and the accuracy, where unimportant connections can be removed either channel-wisely or randomly with a minimal impact on model accuracy. The channel pruning instantly results in a significant latency reduction, while the random weight pruning is more flexible to balance the latency and accuracy. In this paper, we present a unified framework with Joint Channel pruning and Weight pruning (JCW), and achieves a better Pareto-frontier between the latency and accuracy than previous model compression approaches. To fully optimize the trade-off between the latency and accuracy, we develop a tailored multi-objective evolutionary algorithm in the JCW framework, which enables one single search to obtain the optimal candidate architectures for various deployment requirements. Extensive experiments demonstrate that the JCW achieves a better trade-off between the latency and accuracy against various state-of-the-art pruning methods on the ImageNet classification dataset. Our codes are available at


page 1

page 2

page 3

page 4


Architecture Aware Latency Constrained Sparse Neural Networks

Acceleration of deep neural networks to meet a specific latency constrai...

Joint Multi-Dimension Pruning

We present joint multi-dimension pruning (named as JointPruning), a new ...

Network Slimming by Slimmable Networks: Towards One-Shot Architecture Search for Channel Numbers

We study how to set channel numbers in a neural network to achieve bette...

PareCO: Pareto-aware Channel Optimization for Slimmable Neural Networks

Slimmable neural networks provide a flexible trade-off front between pre...

Network Pruning via Feature Shift Minimization

Channel pruning is widely used to reduce the complexity of deep network ...

SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance

Ad relevance modeling plays a critical role in online advertising system...

Accelerate Your CNN from Three Dimensions: A Comprehensive Pruning Framework

To deploy a pre-trained deep CNN on resource-constrained mobile devices,...

Please sign up or login with your details

Forgot password? Click here to reset