Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints

by   Wenlong Mou, et al.

Algorithm-dependent generalization error bounds are central to statistical learning theory. A learning algorithm may use a large hypothesis space, but the limited number of iterations controls its model capacity and generalization error. The impacts of stochastic gradient methods on generalization error for non-convex learning problems not only have important theoretical consequences, but are also critical to generalization errors of deep learning. In this paper, we study the generalization errors of Stochastic Gradient Langevin Dynamics (SGLD) with non-convex objectives. Two theories are proposed with non-asymptotic discrete-time analysis, using Stability and PAC-Bayesian results respectively. The stability-based theory obtains a bound of O(1/nL√(β T_k)), where L is uniform Lipschitz parameter, β is inverse temperature, and T_k is aggregated step sizes. For PAC-Bayesian theory, though the bound has a slower O(1/√(n)) rate, the contribution of each step is shown with an exponentially decaying factor by imposing ℓ^2 regularization, and the uniform Lipschitz constant is also replaced by actual norms of gradients along trajectory. Our bounds have no implicit dependence on dimensions, norms or other capacity measures of parameter, which elegantly characterizes the phenomenon of "Fast Training Guarantees Generalization" in non-convex settings. This is the first algorithm-dependent result with reasonable dependence on aggregated step sizes for non-convex learning, and has important implications to statistical learning aspects of stochastic gradient methods in complicated models such as deep learning.


page 1

page 2

page 3

page 4


On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Generalization error (also known as the out-of-sample error) measures ho...

Time-independent Generalization Bounds for SGLD in Non-convex Settings

We establish generalization error bounds for stochastic gradient Langevi...

Generalization in Machine Learning via Analytical Learning Theory

This paper introduces a novel measure-theoretic learning theory to analy...

Generalization Bounds for Stochastic Gradient Langevin Dynamics: A Unified View via Information Leakage Analysis

Recently, generalization bounds of the non-convex empirical risk minimiz...

Towards Optimal Problem Dependent Generalization Error Bounds in Statistical Learning Theory

We study problem-dependent rates, i.e., generalization errors that scale...

Multi-fidelity Stability for Graph Representation Learning

In the problem of structured prediction with graph representation learni...

Please sign up or login with your details

Forgot password? Click here to reset