Convergence Rates of Training Deep Neural Networks via Alternating Minimization Methods

08/30/2022
by   Jintao Xu, et al.
0

Training deep neural networks (DNNs) is an important and challenging optimization problem in machine learning due to its non-convexity and non-separable structure. The alternating minimization (AM) approaches split the composition structure of DNNs and have drawn great interest in the deep learning and optimization communities. In this paper, we propose a unified framework for analyzing the convergence rate of AM-type network training methods. Our analysis are based on the j-step sufficient decrease conditions and the Kurdyka-Lojasiewicz (KL) property, which relaxes the requirement of designing descent algorithms. We show the detailed local convergence rate if the KL exponent θ varies in [0,1). Moreover, the local R-linear convergence is discussed under a stronger j-step sufficient decrease condition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2018

A Proximal Block Coordinate Descent Algorithm for Deep Neural Network Training

Training deep neural networks (DNNs) efficiently is a challenge due to t...
research
09/09/2020

Tunable Subnetwork Splitting for Model-parallelism of Neural Network Training

Alternating minimization methods have recently been proposed as alternat...
research
10/10/2021

Convergence of Random Reshuffling Under The Kurdyka-Łojasiewicz Inequality

We study the random reshuffling (RR) method for smooth nonconvex optimiz...
research
03/01/2018

Block Coordinate Descent for Deep Learning: Unified Convergence Guarantees

Training deep neural networks (DNNs) efficiently is a challenge due to t...
research
02/21/2020

Detailed Proofs of Alternating Minimization Based Trajectory Generation for Quadrotor Aggressive Flight

This technical report provides detailed theoretical analysis of the algo...
research
02/10/2019

Deducing Kurdyka-Łojasiewicz exponent via inf-projection

Kurdyka-Łojasiewicz (KL) exponent plays an important role in estimating ...
research
12/09/2015

Distributed Training of Deep Neural Networks with Theoretical Analysis: Under SSP Setting

We propose a distributed approach to train deep neural networks (DNNs), ...

Please sign up or login with your details

Forgot password? Click here to reset