Uniform convergence may be unable to explain generalization in deep learning
We cast doubt on the power of uniform convergence-based generalization bounds to provide a complete picture of why overparameterized deep networks generalize well. While it is well-known that many existing bounds are numerically large, through a variety of experiments, we first bring to light another crucial and more concerning aspect of these bounds: in practice, these bounds can increase with the dataset size. Guided by our observations, we then show how uniform convergence could provably break down even in a simple setup that preserves the key elements of deep learning: we present a noisy algorithm that learns a mildly overparameterized linear classifier such that uniform convergence cannot "explain generalization," even if we take into account implicit regularization to the fullest extent possible. More precisely, even if we consider only the set of classifiers output by the algorithm that have test errors less than some small ϵ, applying (two-sided) uniform convergence on this set of classifiers yields a generalization guarantee that is larger than 1-ϵ and is therefore nearly vacuous.
READ FULL TEXT