Gradient Descent Optimizes Infinite-Depth ReLU Implicit Networks with Linear Widths

by   Tianxiang Gao, et al.

Implicit deep learning has recently become popular in the machine learning community since these implicit models can achieve competitive performance with state-of-the-art deep networks while using significantly less memory and computational resources. However, our theoretical understanding of when and how first-order methods such as gradient descent (GD) converge on nonlinear implicit networks is limited. Although this type of problem has been studied in standard feed-forward networks, the case of implicit models is still intriguing because implicit networks have infinitely many layers. The corresponding equilibrium equation probably admits no or multiple solutions during training. This paper studies the convergence of both gradient flow (GF) and gradient descent for nonlinear ReLU activated implicit networks. To deal with the well-posedness problem, we introduce a fixed scalar to scale the weight matrix of the implicit layer and show that there exists a small enough scaling constant, keeping the equilibrium equation well-posed throughout training. As a result, we prove that both GF and GD converge to a global minimum at a linear rate if the width m of the implicit network is linear in the sample size N, i.e., m=Ω(N).


page 1

page 2

page 3

page 4


A global convergence theory for deep ReLU implicit networks via over-parameterization

Implicit deep learning has received increasing attention recently due to...

On the optimization and generalization of overparameterized implicit neural networks

Implicit neural networks have become increasingly attractive in the mach...

On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear Widths

This paper studies the global convergence of gradient descent for deep R...

Implicit regularization of deep residual networks towards neural ODEs

Residual neural networks are state-of-the-art deep learning models. Thei...

Global Convergence of Over-parameterized Deep Equilibrium Models

A deep equilibrium model (DEQ) is implicitly defined through an equilibr...

Random Walk Initialization for Training Very Deep Feedforward Networks

Training very deep networks is an important open problem in machine lear...

Mixing Implicit and Explicit Deep Learning with Skip DEQs and Infinite Time Neural ODEs (Continuous DEQs)

Implicit deep learning architectures, like Neural ODEs and Deep Equilibr...

Please sign up or login with your details

Forgot password? Click here to reset