Does Preprocessing Help Training Over-parameterized Neural Networks?
Deep neural networks have achieved impressive performance in many areas. Designing a fast and provable method for training neural networks is a fundamental question in machine learning. The classical training method requires paying Ω(mnd) cost for both forward computation and backward computation, where m is the width of the neural network, and we are given n training points in d-dimensional space. In this paper, we propose two novel preprocessing ideas to bypass this Ω(mnd) barrier: ∙ First, by preprocessing the initial weights of the neural networks, we can train the neural network in O(m^1-Θ(1/d) n d) cost per iteration. ∙ Second, by preprocessing the input data points, we can train the neural network in O (m^4/5 nd ) cost per iteration. From the technical perspective, our result is a sophisticated combination of tools in different fields, greedy-type convergence analysis in optimization, sparsity observation in practical work, high-dimensional geometric search in data structure, concentration and anti-concentration in probability. Our results also provide theoretical insights for a large number of previously established fast training methods. In addition, our classical algorithm can be generalized to the Quantum computation model. Interestingly, we can get a similar sublinear cost per iteration but avoid preprocessing initial weights or input data points.
READ FULL TEXT