Convergence Acceleration via Chebyshev Step: Plausible Interpretation of Deep-Unfolded Gradient Descent

10/26/2020
by   Satoshi Takabe, et al.
0

Deep unfolding is a promising deep-learning technique, whose network architecture is based on expanding the recursive structure of existing iterative algorithms. Although convergence acceleration is a remarkable advantage of deep unfolding, its theoretical aspects have not been revealed yet. The first half of this study details the theoretical analysis of the convergence acceleration in deep-unfolded gradient descent (DUGD) whose trainable parameters are step sizes. We propose a plausible interpretation of the learned step-size parameters in DUGD by introducing the principle of Chebyshev steps derived from Chebyshev polynomials. The use of Chebyshev steps in gradient descent (GD) enables us to bound the spectral radius of a matrix governing the convergence speed of GD, leading to a tight upper bound on the convergence rate. The convergence rate of GD using Chebyshev steps is shown to be asymptotically optimal, although it has no momentum terms. We also show that Chebyshev steps numerically explain the learned step-size parameters in DUGD well. In the second half of the study, and Chebyshev-periodical successive over-relaxation (Chebyshev-PSOR) is proposed for accelerating linear/nonlinear fixed-point iterations. Theoretical analysis and numerical experiments indicate that Chebyshev-PSOR exhibits significantly faster convergence for various examples such as Jacobi method and proximal gradient methods.

READ FULL TEXT
research
01/15/2020

Theoretical Interpretation of Learned Step Size in Deep-Unfolded Gradient Descent

Deep unfolding is a promising deep-learning technique in which an iterat...
research
04/07/2015

From Averaging to Acceleration, There is Only a Step-size

We show that accelerated gradient descent, averaged gradient descent and...
research
09/03/2019

Fast Gradient Methods with Alignment for Symmetric Linear Systems without Using Cauchy Step

The performance of gradient methods has been considerably improved by th...
research
12/10/2018

Theoretical Analysis of Auto Rate-Tuning by Batch Normalization

Batch Normalization (BN) has become a cornerstone of deep learning acros...
research
09/07/2018

An Anderson-Chebyshev Mixing Method for Nonlinear Optimization

Anderson mixing (or Anderson acceleration) is an efficient acceleration ...
research
09/07/2018

A Fast Anderson-Chebyshev Mixing Method for Nonlinear Optimization

Anderson mixing (or Anderson acceleration) is an efficient acceleration ...
research
01/23/2023

On the Convergence of the Gradient Descent Method with Stochastic Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality

When training neural networks with low-precision computation, rounding e...

Please sign up or login with your details

Forgot password? Click here to reset