Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks

09/23/2018
by   Ohad Shamir, et al.
0

In this note, we study the dynamics of gradient descent on objective functions of the form f(∏_i=1^k w_i) (with respect to scalar parameters w_1,...,w_k), which arise in the context of training depth-k linear neural networks. We prove that for standard random initializations, and under mild assumptions on f, the number of iterations required for convergence scales exponentially with the depth k. This highlights a potential obstacle in understanding the convergence of gradient-based methods for deep linear neural networks, where k is large.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2019

Global Convergence of Gradient Descent for Deep Linear Residual Networks

We analyze the global convergence of gradient descent for deep linear re...
research
12/04/2021

Optimization-Based Separations for Neural Networks

Depth separation results propose a possible theoretical explanation for ...
research
02/18/2020

Learning Parities with Neural Networks

In recent years we see a rapidly growing line of research which shows le...
research
02/19/2018

On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization

Conventional wisdom in deep learning states that increasing depth improv...
research
09/26/2019

The Implicit Bias of Depth: How Incremental Learning Drives Generalization

A leading hypothesis for the surprising generalization of neural network...
research
02/09/2020

On the distance between two neural networks and the stability of learning

How far apart are two neural networks? This is a foundational question i...
research
12/08/2014

Provable Methods for Training Neural Networks with Sparse Connectivity

We provide novel guaranteed approaches for training feedforward neural n...

Please sign up or login with your details

Forgot password? Click here to reset