Representing smooth functions as compositions of near-identity functions with implications for deep network optimization

04/13/2018
by   Peter L. Bartlett, et al.
0

We show that any smooth bi-Lipschitz h can be represented exactly as a composition h_m ∘ ... ∘ h_1 of functions h_1,...,h_m that are close to the identity in the sense that each (h_i-Id) is Lipschitz, and the Lipschitz constant decreases inversely with the number m of functions composed. This implies that h can be represented to any accuracy by a deep residual network whose nonlinear layers compute functions with a small Lipschitz constant. Next, we consider nonlinear regression with a composition of near-identity nonlinear maps. We show that, regarding Fréchet derivatives with respect to the h_1,...,h_m, any critical point of a quadratic criterion in this near-identity region must be a global minimizer. In contrast, if we consider derivatives with respect to parameters of a fixed-size residual network with sigmoid activation functions, we show that there are near-identity critical points that are suboptimal, even in the realizable case. Informally, this means that functional gradient methods for residual networks cannot get stuck at suboptimal critical points corresponding to near-identity layers, whereas parametric gradient methods for sigmoidal residual networks suffer from suboptimal critical points in the near-identity region.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2021

Training Certifiably Robust Neural Networks with Efficient Local Lipschitz Bounds

Certified robustness is a desirable property for deep neural networks in...
research
06/12/2019

Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks

Tight estimation of the Lipschitz constant for deep neural networks (DNN...
research
11/28/2022

Lipschitz constant estimation for 1D convolutional neural networks

In this work, we propose a dissipativity-based method for Lipschitz cons...
research
10/28/2022

Improving Lipschitz-Constrained Neural Networks by Learning Activation Functions

Lipschitz-constrained neural networks have several advantages compared t...
research
10/12/2019

Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers

We study the convergence of gradient flows related to learning deep line...
research
04/17/2022

A Modified Nonlinear Conjugate Gradient Algorithm for Functions with Non-Lipschitz Gradient

In this paper, we propose a modified nonlinear conjugate gradient (NCG) ...
research
10/30/2021

Approximation properties of Residual Neural Networks for Kolmogorov PDEs

In recent years residual neural networks (ResNets) as introduced by [He,...

Please sign up or login with your details

Forgot password? Click here to reset