Asymptotic Finite Sample Information Losses in Neural Classifiers

by   Brandon Foggo, et al.

This paper considers the subject of information losses arising from finite datasets used in the training of neural classifiers. It proves a relationship between such losses and the product of the expected total variation of the estimated neural model with the information about the feature space contained in the hidden representation of that model. It then shows that this total variation drops extremely quickly with sample size. It ultimately obtains bounds on information losses that are less sensitive to input compression and much tighter than existing bounds. This brings about a tighter relevance of information theory to the training of neural networks, so a review of techniques for information estimation and control is provided. The paper then explains some potential uses of these bounds in the field of active learning, and then uses them to explain some recent experimental findings of information compression in neural networks which cannot be explained by previous work. It then uses the bounds to justify an information regularization term in the training of neural networks for low entropy feature space problems. Finally, the paper shows that, not only are these bounds much tighter than existing ones, but that these bounds correspond with experiments as well.


page 1

page 2

page 3

page 4


Tighter Expected Generalization Error Bounds via Convexity of Information Measures

Generalization error bounds are essential to understanding machine learn...

Statistical Learnability of Generalized Additive Models based on Total Variation Regularization

A generalized additive model (GAM, Hastie and Tibshirani (1987)) is a no...

A function space analysis of finite neural networks with insights from sampling theory

This work suggests using sampling theory to analyze the function space r...

Finite-sample Analysis of M-estimators using Self-concordance

We demonstrate how self-concordance of the loss can be exploited to obta...

LOss-Based SensiTivity rEgulaRization: towards deep sparse neural networks

LOBSTER (LOss-Based SensiTivity rEgulaRization) is a method for training...

CP-PINNs: Changepoints Detection in PDEs using Physics Informed Neural Networks with Total-Variation Penalty

We consider the inverse problem for the Partial Differential Equations (...

Pre-interpolation loss behaviour in neural networks

When training neural networks as classifiers, it is common to observe an...

Please sign up or login with your details

Forgot password? Click here to reset