Information Bottleneck Analysis of Deep Neural Networks via Lossy Compression

by   Ivan Butakov, et al.

The Information Bottleneck (IB) principle offers an information-theoretic framework for analyzing the training process of deep neural networks (DNNs). Its essence lies in tracking the dynamics of two mutual information (MI) values: one between the hidden layer and the class label, and the other between the hidden layer and the DNN input. According to the hypothesis put forth by Shwartz-Ziv and Tishby (2017), the training process consists of two distinct phases: fitting and compression. The latter phase is believed to account for the good generalization performance exhibited by DNNs. Due to the challenging nature of estimating MI between high-dimensional random vectors, this hypothesis has only been verified for toy NNs or specific types of NNs, such as quantized NNs and dropout NNs. In this paper, we introduce a comprehensive framework for conducting IB analysis of general NNs. Our approach leverages the stochastic NN method proposed by Goldfeld et al. (2019) and incorporates a compression step to overcome the obstacles associated with high dimensionality. In other words, we estimate the MI between the compressed representations of high-dimensional random vectors. The proposed method is supported by both theoretical and practical justifications. Notably, we demonstrate the accuracy of our estimator through synthetic experiments featuring predefined MI values. Finally, we perform IB analysis on a close-to-real-scale convolutional DNN, which reveals new features of the MI dynamics.


page 1

page 2

page 3

page 4


Estimating Information Flow in Neural Networks

We study the flow of information and the evolution of internal represent...

Understanding Learning Dynamics of Binary Neural Networks via Information Bottleneck

Compact neural networks are essential for affordable and power efficient...

Information Plane Analysis for Dropout Neural Networks

The information-theoretic framework promises to explain the predictive p...

What Information Does a ResNet Compress?

The information bottleneck principle (Shwartz-Ziv Tishby, 2017) sugg...

The learning phases in NN: From Fitting the Majority to Fitting a Few

The learning dynamics of deep neural networks are subject to controversy...

Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks

Minimizing non-convex and high-dimensional objective functions is challe...

Bounding generalization error with input compression: An empirical study with infinite-width networks

Estimating the Generalization Error (GE) of Deep Neural Networks (DNNs) ...

Please sign up or login with your details

Forgot password? Click here to reset