Generalization in Deep Learning

10/16/2017
by   Kenji Kawaguchi, et al.
0

This paper explains why deep learning can generalize well, despite large capacity and possible algorithmic instability, nonrobustness, and sharp minima, effectively addressing an open problem in the literature. Based on our theoretical insight, this paper also proposes a family of new regularization methods. Its simplest member was empirically shown to improve base models and achieve state-of-the-art performance on MNIST and CIFAR-10 benchmarks. Moreover, this paper presents both data-dependent and data-independent generalization guarantees with improved convergence rates. Our results suggest several new open areas of research.

READ FULL TEXT
research
03/15/2017

Sharp Minima Can Generalize For Deep Nets

Despite their overwhelming capacity to overfit, deep learning architectu...
research
11/19/2019

Information-Theoretic Local Minima Characterization and Regularization

Recent advances in deep learning theory have evoked the study of general...
research
06/27/2022

Robustness Implies Generalization via Data-Dependent Generalization Bounds

This paper proves that robustness implies generalization via data-depend...
research
05/21/2018

SmoothOut: Smoothing Out Sharp Minima for Generalization in Large-Batch Deep Learning

In distributed deep learning, a large batch size in Stochastic Gradient ...
research
10/01/2019

Entropy Penalty: Towards Generalization Beyond the IID Assumption

It has been shown that instead of learning actual object features, deep ...
research
07/03/2017

Parle: parallelizing stochastic gradient descent

We propose a new algorithm called Parle for parallel training of deep ne...
research
05/26/2023

Manifold Regularization for Memory-Efficient Training of Deep Neural Networks

One of the prevailing trends in the machine- and deep-learning community...

Please sign up or login with your details

Forgot password? Click here to reset