Differentiable Sparsification for Deep Neural Networks
A deep neural network has relieved the burden of feature engineering by human experts, but comparable efforts are instead required to determine an effective architecture. On the other hands, as the size of a network has over-grown, a lot of resources are also invested to reduce its size. These problems can be addressed by sparsification of an over-complete model, which removes redundant parameters or connections by pruning them away after training or encouraging them to become zero during training. In general, however, these approaches are not fully differentiable and interrupt an end-to-end training process with the stochastic gradient descent in that they require either a parameter selection or a soft-thresholding step. In this paper, we propose a fully differentiable sparsification method for deep neural networks, which allows parameters to be exactly zero during training, and thus can learn the sparsified structure and the weights of networks simultaneously using the stochastic gradient descent. We apply the proposed method to various popular models in order to show its effectiveness.
READ FULL TEXT