Generalization Bounds for Stochastic Gradient Descent via Localized ε-Covers

09/19/2022
by   Sejun Park, et al.
0

In this paper, we propose a new covering technique localized for the trajectories of SGD. This localization provides an algorithm-specific complexity measured by the covering number, which can have dimension-independent cardinality in contrast to standard uniform covering arguments that result in exponential dimension dependency. Based on this localized construction, we show that if the objective function is a finite perturbation of a piecewise strongly convex and smooth function with P pieces, i.e. non-convex and non-smooth in general, the generalization error can be upper bounded by O(√((log nlog(nP))/n)), where n is the number of data samples. In particular, this rate is independent of dimension and does not require early stopping and decaying step size. Finally, we employ these results in various contexts and derive generalization bounds for multi-index linear models, multi-class support vector machines, and K-means clustering for both hard and soft label setups, improving the known state-of-the-art rates.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/08/2012

Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes

Stochastic Gradient Descent (SGD) is one of the simplest and most popula...
research
06/28/2021

The Convergence Rate of SGD's Final Iterate: Analysis on Dimension Dependence

Stochastic Gradient Descent (SGD) is among the simplest and most popular...
research
10/25/2018

Uniform Convergence of Gradients for Non-Convex Learning and Optimization

We investigate 1) the rate at which refined properties of the empirical ...
research
04/29/2021

Fine-grained Generalization Analysis of Vector-valued Learning

Many fundamental machine learning tasks can be formulated as a problem o...
research
11/12/2020

Towards Optimal Problem Dependent Generalization Error Bounds in Statistical Learning Theory

We study problem-dependent rates, i.e., generalization errors that scale...
research
07/25/2017

Error Bounds for Piecewise Smooth and Switching Regression

The paper deals with regression problems, in which the nonsmooth target ...

Please sign up or login with your details

Forgot password? Click here to reset