Convergence of Deep Neural Networks to a Hierarchical Covariance Matrix Decomposition

03/14/2017

∙

We show that in a deep neural network trained with ReLU, the low-lying layers should be replaceable with truncated linearly activated layers. We derive the gradient descent equations in this truncated linear model and demonstrate that --if the distribution of the training data is stationary during training-- the optimal choice for weights in these low-lying layers is the eigenvectors of the covariance matrix of the data. If the training data is random and uniform enough, these eigenvectors can be found using a small fraction of the training data, thus reducing the computational complexity of training. We show how this can be done recursively to form successive, trained layers. At least in the first layer, our tests show that this approach improves classification of images while reducing network size.

READ FULL TEXT

Convergence of Deep Neural Networks to a Hierarchical Covariance Matrix Decomposition

Sign in with Google

Consider DeepAI Pro