Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic Gradient Descent

by   Lingjiong Zhu, et al.

Algorithmic stability is an important notion that has proven powerful for deriving generalization bounds for practical algorithms. The last decade has witnessed an increasing number of stability bounds for different algorithms applied on different classes of loss functions. While these bounds have illuminated various properties of optimization algorithms, the analysis of each case typically required a different proof technique with significantly different mathematical tools. In this study, we make a novel connection between learning theory and applied probability and introduce a unified guideline for proving Wasserstein stability bounds for stochastic optimization algorithms. We illustrate our approach on stochastic gradient descent (SGD) and we obtain time-uniform stability bounds (i.e., the bound does not increase with the number of iterations) for strongly convex losses and non-convex losses with additive noise, where we recover similar results to the prior art or extend them to more general cases by using a single proof technique. Our approach is flexible and can be generalizable to other popular optimizers, as it mainly requires developing Lyapunov functions, which are often readily available in the literature. It also illustrates that ergodicity is an important component for obtaining time-uniform bounds – which might not be achieved for convex or non-convex losses unless additional noise is injected to the iterates. Finally, we slightly stretch our analysis technique and prove time-uniform bounds for SGD under convex and non-convex losses (without additional additive noise), which, to our knowledge, is novel.


page 1

page 2

page 3

page 4


Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses

Uniform stability is a notion of algorithmic stability that bounds the w...

Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent

Recently there are a considerable amount of work devoted to the study of...

Multi-fidelity Stability for Graph Representation Learning

In the problem of structured prediction with graph representation learni...

Generalization Error Bounds for Optimization Algorithms via Stability

Many machine learning tasks can be formulated as Regularized Empirical R...

Generalization Bounds for Stochastic Gradient Langevin Dynamics: A Unified View via Information Leakage Analysis

Recently, generalization bounds of the non-convex empirical risk minimiz...

Time-independent Generalization Bounds for SGLD in Non-convex Settings

We establish generalization error bounds for stochastic gradient Langevi...

Please sign up or login with your details

Forgot password? Click here to reset