LU decomposition and Toeplitz decomposition of a neural network

by   Yucong Liu, et al.

It is well-known that any matrix A has an LU decomposition. Less well-known is the fact that it has a 'Toeplitz decomposition' A = T_1 T_2 ⋯ T_r where T_i's are Toeplitz matrices. We will prove that any continuous function f : ℝ^n →ℝ^m has an approximation to arbitrary accuracy by a neural network that takes the form L_1 σ_1 U_1 σ_2 L_2 σ_3 U_2 ⋯ L_r σ_2r-1 U_r, i.e., where the weight matrices alternate between lower and upper triangular matrices, σ_i(x) := σ(x - b_i) for some bias vector b_i, and the activation σ may be chosen to be essentially any uniformly continuous nonpolynomial function. The same result also holds with Toeplitz matrices, i.e., f ≈ T_1 σ_1 T_2 σ_2 ⋯σ_r-1 T_r to arbitrary accuracy, and likewise for Hankel matrices. A consequence of our Toeplitz result is a fixed-width universal approximation theorem for convolutional neural networks, which so far have only arbitrary width versions. Since our results apply in particular to the case when f is a general neural network, we may regard them as LU and Toeplitz decompositions of a neural network. The practical implication of our results is that one may vastly reduce the number of weight parameters in a neural network without sacrificing its power of universal approximation. We will present several experiments on real data sets to show that imposing such structures on the weight matrices sharply reduces the number of training parameters with almost no noticeable effect on test accuracy.


page 1

page 2

page 3

page 4


Universal Approximation with Deep Narrow Networks

The classical Universal Approximation Theorem certifies that the univers...

Universal Property of Convolutional Neural Networks

Universal approximation, whether a set of functions can approximate an a...

Neural Network Approximation: Three Hidden Layers Are Enough

A three-hidden-layer neural network with super approximation power is in...

A closer look at the approximation capabilities of neural networks

The universal approximation theorem, in one of its most general versions...

Neural Network Layer Matrix Decomposition reveals Latent Manifold Encoding and Memory Capacity

We prove the converse of the universal approximation theorem, i.e. a neu...

Deep Network Approximation: Achieving Arbitrary Accuracy with Fixed Number of Neurons

This paper develops simple feed-forward neural networks that achieve the...

Memory Capacity of a Random Neural Network

This paper considers the problem of information capacity of a random neu...

Please sign up or login with your details

Forgot password? Click here to reset