Sparse Weight Activation Training

01/07/2020

∙

Training convolutional neural networks (CNNs) is time-consuming. Prior work has explored how to reduce the computational demands of training by eliminating gradients with relatively small magnitude. We show that eliminating small magnitude components has limited impact on the direction of high-dimensional vectors. However, in the context of training a CNN, we find that eliminating small magnitude components of weight and activation vectors allows us to train deeper networks on more complex datasets versus eliminating small magnitude components of gradients. We propose Sparse Weight Activation Training (SWAT), an algorithm that embodies these observations. SWAT reduces computations by 50 to 80 Sparse Graph algorithm. SWAT also reduces memory footprint by 23 activations and 50

READ FULL TEXT

Sparse Weight Activation Training

Sign in with Google

Consider DeepAI Pro