Predicting the outputs of finite networks trained with noisy gradients

by   Gadi Naveh, et al.

A recent line of studies has focused on the infinite width limit of deep neural networks (DNNs) where, under a certain deterministic training protocol, the DNN outputs are related to a Gaussian Process (GP) known as the Neural Tangent Kernel (NTK). However, finite-width DNNs differ from GPs quantitatively and for CNNs the difference may be qualitative. Here we present a DNN training protocol involving noise whose outcome is mappable to a certain non-Gaussian stochastic process. An analytical framework is then introduced to analyze this resulting non-Gaussian process, whose deviation from a GP is controlled by the finite width. Our work extends upon previous relations between DNNs and GPs in several ways: (a) In the infinite width limit, it establishes a mapping between DNNs and a GP different from the NTK. (b) It allows computing analytically the general form of the finite width correction (FWC) for DNNs with arbitrary activation functions and depth and further provides insight on the magnitude and implications of these FWCs. (c) It appears capable of providing better performance than the corresponding GP in the case of CNNs. We are able to predict the outputs of empirical finite networks with high accuracy, improving upon the accuracy of GP predictions by over an order of magnitude. Overall, we provide a framework that offers both an analytical handle and a more faithful model of real-world settings than previous studies in this avenue of research.


page 1

page 2

page 3

page 4


Wide Neural Networks with Bottlenecks are Deep Gaussian Processes

There is recently much work on the "wide limit" of neural networks, wher...

A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs

Deep neural networks (DNNs) in the infinite width/channel limit have rec...

Wide neural networks: From non-gaussian random fields at initialization to the NTK geometry of training

Recent developments in applications of artificial neural networks with o...

A connection between probability, physics and neural networks

We illustrate an approach that can be exploited for constructing neural ...

Deep neural networks with dependent weights: Gaussian Process mixture limit, heavy tails, sparsity and compressibility

This article studies the infinite-width limit of deep feedforward neural...

The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective

Large width limits have been a recent focus of deep learning research: m...

Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks

Deep neural networks (DNNs) have excellent representative power and are ...

Please sign up or login with your details

Forgot password? Click here to reset