Width is Less Important than Depth in ReLU Neural Networks

02/08/2022
by   Gal Vardi, et al.
0

We solve an open question from Lu et al. (2017), by showing that any target network with inputs in ℝ^d can be approximated by a width O(d) network (independent of the target network's architecture), whose number of parameters is essentially larger only by a linear factor. In light of previous depth separation theorems, which imply that a similar result cannot hold when the roles of width and depth are interchanged, it follows that depth plays a more significant role than width in the expressive power of neural networks. We extend our results to constructing networks with bounded weights, and to constructing networks with width at most d+2, which is close to the minimal possible width due to previous lower bounds. Both of these constructions cause an extra polynomial factor in the number of parameters over the target network. We also show an exact representation of wide and shallow networks using deep and narrow networks which, in certain cases, does not increase the number of parameters over the target network.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2020

Are wider nets better given the same number of parameters?

Empirical studies demonstrate that the performance of neural networks im...
research
11/30/2022

Average Path Length: Sparsification of Nonlinearties Creates Surprisingly Shallow Networks

We perform an empirical study of the behaviour of deep networks when pus...
research
03/01/2021

Computing the Information Content of Trained Neural Networks

How much information does a learning algorithm extract from the training...
research
10/26/2020

Provable Memorization via Deep Neural Networks using Sub-linear Parameters

It is known that Θ(N) parameters are sufficient for neural networks to m...
research
04/03/2017

Truncating Wide Networks using Binary Tree Architectures

Recent study shows that a wide deep network can obtain accuracy comparab...
research
01/18/2021

A simple geometric proof for the benefit of depth in ReLU networks

We present a simple proof for the benefit of depth in multi-layer feedfo...
research
06/23/2021

Adversarial Examples in Multi-Layer Random ReLU Networks

We consider the phenomenon of adversarial examples in ReLU networks with...

Please sign up or login with your details

Forgot password? Click here to reset