Representational Power of ReLU Networks and Polynomial Kernels: Beyond Worst-Case Analysis
There has been a large amount of interest, both in the past and particularly recently, into the power of different families of universal approximators, e.g. ReLU networks, polynomials, rational functions. However, current research has focused almost exclusively on understanding this problem in a worst-case setting, e.g. bounding the error of the best infinity-norm approximation in a box. In this setting a high-degree polynomial is required to even approximate a single ReLU. However, in real applications with high dimensional data we expect it is only important to approximate the desired function well on certain relevant parts of its domain. With this motivation, we analyze the ability of neural networks and polynomial kernels of bounded degree to achieve good statistical performance on a simple, natural inference problem with sparse latent structure. We give almost-tight bounds on the performance of both neural networks and low degree polynomials for this problem. Our bounds for polynomials involve new techniques which may be of independent interest and show major qualitative differences with what is known in the worst-case setting.
READ FULL TEXT