Fast learning rate of deep learning via a kernel perspective

05/29/2017

∙

We develop a new theoretical framework to analyze the generalization error of deep learning, and derive a new fast learning rate for two representative algorithms: empirical risk minimization and Bayesian deep learning. The series of theoretical analyses of deep learning has revealed its high expressive power and universal approximation capability. Although these analyses are highly nonparametric, existing generalization error analyses have been developed mainly in a fixed dimensional parametric model. To compensate this gap, we develop an infinite dimensional model that is based on an integral form as performed in the analysis of the universal approximation capability. This allows us to define a reproducing kernel Hilbert space corresponding to each layer. Our point of view is to deal with the ordinary finite dimensional deep neural network as a finite approximation of the infinite dimensional one. The approximation error is evaluated by the degree of freedom of the reproducing kernel Hilbert space in each layer. To estimate a good finite dimensional model, we consider both of empirical risk minimization and Bayesian deep learning. We derive its generalization error bound and it is shown that there appears bias-variance trade-off in terms of the number of parameters of the finite dimensional approximation. We show that the optimal width of the internal layers can be determined through the degree of freedom and the convergence rate can be faster than O(1/√(n)) rate which has been shown in the existing studies.

READ FULL TEXT

Fast learning rate of deep learning via a kernel perspective

Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics

Optimal Rates for Regularized Conditional Mean Embedding Learning

Deep Learning with Kernels through RKHM and the Perron-Frobenius Operator

Minimax Optimal Kernel Operator Learning via Multilevel Training

Autoencoding any Data through Kernel Autoencoders

Fast Learning Rate of Non-Sparse Multiple Kernel Learning and Optimal Regularization Strategies

How much is optimal reinsurance degraded by error?

Fast learning rate of deep learning via a kernel perspective

Related Research

Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics

Optimal Rates for Regularized Conditional Mean Embedding Learning

Deep Learning with Kernels through RKHM and the Perron-Frobenius Operator

Minimax Optimal Kernel Operator Learning via Multilevel Training

Autoencoding any Data through Kernel Autoencoders

Fast Learning Rate of Non-Sparse Multiple Kernel Learning and Optimal Regularization Strategies

How much is optimal reinsurance degraded by error?