Depth Separation for Neural Networks
Let f:S^d-1×S^d-1→S be a function of the form f(x,x') = g(〈x,x'〉) for g:[-1,1]→R. We give a simple proof that shows that poly-size depth two neural networks with (exponentially) bounded weights cannot approximate f whenever g cannot be approximated by a low degree polynomial. Moreover, for many g's, such as g(x)=sin(π d^3x), the number of neurons must be 2^Ω(d(d)). Furthermore, the result holds w.r.t. the uniform distribution on S^d-1×S^d-1. As many functions of the above form can be well approximated by poly-size depth three networks with poly-bounded weights, this establishes a separation between depth two and depth three networks w.r.t. the uniform distribution on S^d-1×S^d-1.
READ FULL TEXT