Estimating Information Flow in Neural Networks

by   Ziv Goldfeld, et al.

We study the flow of information and the evolution of internal representations during deep neural network (DNN) training, aiming to demystify the compression aspect of the information bottleneck theory. The theory suggests that DNN training comprises a rapid fitting phase followed by a slower compression phase, in which the mutual information I(X;T) between the input X and internal representations T decreases. Several papers observe compression of estimated mutual information on different DNN models, but the true I(X;T) over these networks is provably either constant (discrete X) or infinite (continuous X). This work explains the discrepancy between theory and experiments, and clarifies what was actually measured by these past works. To this end, we introduce an auxiliary (noisy) DNN framework for which I(X;T) is a meaningful quantity that depends on the network's parameters. This noisy framework is shown to be a good proxy for the original (deterministic) DNN both in terms of performance and the learned representations. We then develop a rigorous estimator for I(X;T) in noisy DNNs and observe compression in various models. By relating I(X;T) in the noisy DNN to an information-theoretic communication problem, we show that compression is driven by the progressive clustering of hidden representations of inputs from the same class. Several methods to directly monitor clustering of hidden representations, both in noisy and deterministic DNNs, are used to show that meaningful clusters form in the T space. Finally, we return to the estimator of I(X;T) employed in past works, and demonstrate that while it fails to capture the true (vacuous) mutual information, it does serve as a measure for clustering. This clarifies the past observations of compression and isolates the geometric clustering of hidden representations as the true phenomenon of interest.


page 4

page 7

page 8

page 9

page 10

page 11

page 12

page 22


Information Bottleneck Analysis of Deep Neural Networks via Lossy Compression

The Information Bottleneck (IB) principle offers an information-theoreti...

The deterministic information bottleneck

Lossy compression and clustering fundamentally involve a decision about ...

Bounding generalization error with input compression: An empirical study with infinite-width networks

Estimating the Generalization Error (GE) of Deep Neural Networks (DNNs) ...

Information Bottleneck: Exact Analysis of (Quantized) Neural Networks

The information bottleneck (IB) principle has been suggested as a way to...

Information Plane Analysis for Dropout Neural Networks

The information-theoretic framework promises to explain the predictive p...

Examining the causal structures of deep neural networks using information theory

Deep Neural Networks (DNNs) are often examined at the level of their res...

Mutual Information for Explainable Deep Learning of Multiscale Systems

Timely completion of design cycles for multiscale and multiphysics syste...

Please sign up or login with your details

Forgot password? Click here to reset