Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime

06/22/2020
by   Atsushi Nitanda, et al.
14

We analyze the convergence of the averaged stochastic gradient descent for over-parameterized two-layer neural networks for regression problems. It was recently found that, under the neural tangent kernel (NTK) regime, where the learning dynamics for overparameterized neural networks can be mostly characterized by that for the associated reproducing kernel Hilbert space (RKHS), an NTK plays an important role in revealing the global convergence of gradient-based methods. However, there is still room for a convergence rate analysis in the NTK regime. In this study, we show the global convergence of the averaged stochastic gradient descent and derive the optimal convergence rate by exploiting the complexities of the target function and the RKHS associated with the NTK. Moreover, we show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate through a smooth approximation of ReLU networks under certain conditions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/16/2017

Conditional Accelerated Lazy Stochastic Gradient Descent

In this work we introduce a conditional accelerated lazy stochastic grad...
research
05/01/2021

One-pass Stochastic Gradient Descent in Overparametrized Two-layer Neural Networks

There has been a recent surge of interest in understanding the convergen...
research
09/14/2023

How many Neurons do we need? A refined Analysis for Shallow Networks trained with Gradient Descent

We analyze the generalization properties of two-layer neural networks in...
research
02/04/2022

Polynomial convergence of iterations of certain random operators in Hilbert space

We study the convergence of random iterative sequence of a family of ope...
research
09/25/2022

Capacity dependent analysis for functional online learning algorithms

This article provides convergence analysis of online stochastic gradient...
research
02/02/2022

Tight Convergence Rate Bounds for Optimization Under Power Law Spectral Conditions

Performance of optimization on quadratic problems sensitively depends on...
research
05/20/2022

SADAM: Stochastic Adam, A Stochastic Operator for First-Order Gradient-based Optimizer

In this work, to efficiently help escape the stationary and saddle point...

Please sign up or login with your details

Forgot password? Click here to reset