Sparse Weight Activation Training

01/07/2020
by   Md Aamir Raihan, et al.
0

Training convolutional neural networks (CNNs) is time-consuming. Prior work has explored how to reduce the computational demands of training by eliminating gradients with relatively small magnitude. We show that eliminating small magnitude components has limited impact on the direction of high-dimensional vectors. However, in the context of training a CNN, we find that eliminating small magnitude components of weight and activation vectors allows us to train deeper networks on more complex datasets versus eliminating small magnitude components of gradients. We propose Sparse Weight Activation Training (SWAT), an algorithm that embodies these observations. SWAT reduces computations by 50 to 80 Sparse Graph algorithm. SWAT also reduces memory footprint by 23 activations and 50

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset