The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies

by   Ronen Basri, et al.

We study the relationship between the speed at which a neural network learns a function and the frequency of the function. We build on recent results that show that the dynamics of overparameterized neural networks trained with gradient descent can be well approximated by a linear system. When normalized training data is uniformly distributed on a hypersphere, the eigenfunctions of this linear system are spherical harmonic functions. We derive the corresponding eigenvalues for each frequency after introducing a bias term in the model. This bias term had been omitted from the linear network model without significantly affecting previous theoretical results. However, we show theoretically and experimentally that a shallow neural network without bias cannot learn simple, low frequency functions with odd frequencies, in the limit of large amounts of data. Our results enable us to make specific predictions of the time it will take a network with bias to learn functions of varying frequency. These predictions match the behavior of real shallow and deep networks.


page 1

page 2

page 3

page 4


Frequency Bias in Neural Networks for Input of Non-Uniform Density

Recent works have partly attributed the generalization ability of over-p...

Spectral Bias in Practice: The Role of Function Frequency in Generalization

Despite their ability to represent highly expressive functions, deep lea...

Convergence Analysis of Over-parameterized Deep Linear Networks, and the Principal Components Bias

Convolutional Neural networks of different architectures seem to learn t...

Implicit Bias of MSE Gradient Optimization in Underparameterized Neural Networks

We study the dynamics of a neural network in function space when optimiz...

Logarithmic Frequency Scaling and Consistent Frequency Coverage for the Selection of Auditory Filterbank Center Frequencies

This paper provides new insights into the problem of selecting filter ce...

A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree Spectral Bias of Neural Networks

Despite the capacity of neural nets to learn arbitrary functions, models...

Linear CNNs Discover the Statistical Structure of the Dataset Using Only the Most Dominant Frequencies

Our theoretical understanding of the inner workings of general convoluti...

Please sign up or login with your details

Forgot password? Click here to reset