Optimization-Based Separations for Neural Networks

12/04/2021
by   Itay Safran, et al.
0

Depth separation results propose a possible theoretical explanation for the benefits of deep neural networks over shallower architectures, establishing that the former possess superior approximation capabilities. However, there are no known results in which the deeper architecture leverages this advantage into a provable optimization guarantee. We prove that when the data are generated by a distribution with radial symmetry which satisfies some mild assumptions, gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations, and where the hidden layer is held fixed throughout training. Since it is known that ball indicators are hard to approximate with respect to a certain heavy-tailed distribution when using depth 2 networks with a single layer of non-linearities (Safran and Shamir, 2017), this establishes what is to the best of our knowledge, the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice. Our proof technique relies on a random features approach which reduces the problem to learning with a single neuron, where new tools are required to show the convergence of gradient descent when the distribution of the data is heavy-tailed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2018

Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks

In this note, we study the dynamics of gradient descent on objective fun...
research
07/26/2022

Quiver neural networks

We develop a uniform theoretical approach towards the analysis of variou...
research
10/31/2016

Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks

We provide several new depth-based separation results for feed-forward n...
research
07/04/2019

Learning One-hidden-layer neural networks via Provable Gradient Descent with Random Initialization

Although deep learning has shown its powerful performance in many applic...
research
10/26/2020

Provable Memorization via Deep Neural Networks using Sub-linear Parameters

It is known that Θ(N) parameters are sufficient for neural networks to m...
research
02/02/2021

Depth separation beyond radial functions

High-dimensional depth separation results for neural networks show that ...
research
07/23/2019

Heavy-ball Algorithms Always Escape Saddle Points

Nonconvex optimization algorithms with random initialization have attrac...

Please sign up or login with your details

Forgot password? Click here to reset