Generalization Bounds via Convex Analysis

02/10/2022
by   Gábor Lugosi, et al.
0

Since the celebrated works of Russo and Zou (2016,2019) and Xu and Raginsky (2017), it has been well known that the generalization error of supervised learning algorithms can be bounded in terms of the mutual information between their input and the output, given that the loss of any fixed hypothesis has a subgaussian tail. In this work, we generalize this result beyond the standard choice of Shannon's mutual information to measure the dependence between the input and the output. Our main result shows that it is indeed possible to replace the mutual information by any strongly convex function of the joint input-output distribution, with the subgaussianity condition on the losses replaced by a bound on an appropriately chosen norm capturing the geometry of the dependence measure. This allows us to derive a range of generalization bounds that are either entirely new or strengthen previously known ones. Examples include bounds stated in terms of p-norm divergences and the Wasserstein-2 distance, which are respectively applicable for heavy-tailed loss distributions and highly smooth loss functions. Our analysis is entirely based on elementary tools from convex analysis by tracking the growth of a potential function associated with the dependence measure and the loss function.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2017

Information-theoretic analysis of generalization capability of learning algorithms

We derive upper bounds on the generalization error of a learning algorit...
research
03/09/2019

Strengthened Information-theoretic Bounds on the Generalization Error

The following problem is considered: given a joint distribution P_XY and...
research
01/15/2019

Tightening Mutual Information Based Bounds on Generalization Error

A mutual information based upper bound on the generalization error of a ...
research
06/11/2018

Chaining Mutual Information and Tightening Generalization Bounds

Bounding the generalization error of learning algorithms has a long hist...
research
06/21/2022

Supermodular f-divergences and bounds on lossy compression and generalization error with mutual f-information

In this paper, we introduce super-modular -divergences and provide three...
research
11/16/2021

Generalization Bounds and Algorithms for Learning to Communicate over Additive Noise Channels

An additive noise channel is considered, in which the distribution of th...
research
04/16/2020

Computing all identifiable functions of parameters for ODE models

Parameter identifiability is a structural property of an ODE model for r...

Please sign up or login with your details

Forgot password? Click here to reset