Fast learning rate of deep learning via a kernel perspective

by   Taiji Suzuki, et al.

We develop a new theoretical framework to analyze the generalization error of deep learning, and derive a new fast learning rate for two representative algorithms: empirical risk minimization and Bayesian deep learning. The series of theoretical analyses of deep learning has revealed its high expressive power and universal approximation capability. Although these analyses are highly nonparametric, existing generalization error analyses have been developed mainly in a fixed dimensional parametric model. To compensate this gap, we develop an infinite dimensional model that is based on an integral form as performed in the analysis of the universal approximation capability. This allows us to define a reproducing kernel Hilbert space corresponding to each layer. Our point of view is to deal with the ordinary finite dimensional deep neural network as a finite approximation of the infinite dimensional one. The approximation error is evaluated by the degree of freedom of the reproducing kernel Hilbert space in each layer. To estimate a good finite dimensional model, we consider both of empirical risk minimization and Bayesian deep learning. We derive its generalization error bound and it is shown that there appears bias-variance trade-off in terms of the number of parameters of the finite dimensional approximation. We show that the optimal width of the internal layers can be determined through the degree of freedom and the convergence rate can be faster than O(1/√(n)) rate which has been shown in the existing studies.


page 1

page 2

page 3

page 4


Optimal Rates for Regularized Conditional Mean Embedding Learning

We address the consistency of a kernel ridge regression estimate of the ...

Deep Learning with Kernels through RKHM and the Perron-Frobenius Operator

Reproducing kernel Hilbert C^*-module (RKHM) is a generalization of repr...

Minimax Optimal Kernel Operator Learning via Multilevel Training

Learning mappings between infinite-dimensional function spaces has achie...

Autoencoding any Data through Kernel Autoencoders

This paper investigates a novel algorithmic approach to data representat...

Fast Learning Rate of Non-Sparse Multiple Kernel Learning and Optimal Regularization Strategies

In this paper, we give a new generalization error bound of Multiple Kern...

How much is optimal reinsurance degraded by error?

The literature on optimal reinsurance does not deal with how much the ef...

Please sign up or login with your details

Forgot password? Click here to reset