Understanding the Behaviour of the Empirical Cross-Entropy Beyond the Training Distribution

05/28/2019
by   Matías Vera, et al.
0

Machine learning theory has mostly focused on generalization to samples from the same distribution as the training data. Whereas a better understanding of generalization beyond the training distribution where the observed distribution changes is also fundamentally important to achieve a more powerful form of generalization. In this paper, we attempt to study through the lens of information measures how a particular architecture behaves when the true probability law of the samples is potentially different at training and testing times. Our main result is that the testing gap between the empirical cross-entropy and its statistical expectation (measured with respect to the testing probability law) can be bounded with high probability by the mutual information between the input testing samples and the corresponding representations, generated by the encoder obtained at training time. These results of theoretical nature are supported by numerical simulations showing that the mentioned mutual information is representative of the testing gap, capturing qualitatively the dynamic in terms of the hyperparameters of the network.

READ FULL TEXT
research
06/19/2021

Neural Network Classifier as Mutual Information Evaluator

Cross-entropy loss with softmax output is a standard choice to train neu...
research
09/21/2022

Mutual Information Learned Classifiers: an Information-theoretic Viewpoint of Training Deep Learning Classification Systems

Deep learning systems have been reported to achieve state-of-the-art per...
research
02/14/2018

The Role of Information Complexity and Randomization in Representation Learning

A grand challenge in representation learning is to learn the different e...
research
08/09/2021

Unified Regularity Measures for Sample-wise Learning and Generalization

Fundamental machine learning theory shows that different samples contrib...
research
10/22/2020

The Role of Mutual Information in Variational Classifiers

Overfitting data is a well-known phenomenon related with the generation ...
research
03/04/2022

Rate-Distortion Theoretic Generalization Bounds for Stochastic Learning Algorithms

Understanding generalization in modern machine learning settings has bee...
research
12/10/2021

PACMAN: PAC-style bounds accounting for the Mismatch between Accuracy and Negative log-loss

The ultimate performance of machine learning algorithms for classificati...

Please sign up or login with your details

Forgot password? Click here to reset