Information Dropout: Learning Optimal Representations Through Noisy Computation

11/04/2016
by   Alessandro Achille, et al.
0

The cross-entropy loss commonly used in deep learning is closely related to the defining properties of optimal representations, but does not enforce some of the key properties. We show that this can be solved by adding a regularization term, which is in turn related to injecting multiplicative noise in the activations of a Deep Neural Network, a special case of which is the common practice of dropout. We show that our regularized loss function can be efficiently minimized using Information Dropout, a generalization of dropout rooted in information theoretic principles that automatically adapts to the data and can better exploit architectures of limited capacity. When the task is the reconstruction of the input, we show that our loss function yields a Variational Autoencoder as a special case, thus providing a link between representation learning, information theory and variational inference. Finally, we prove that we can promote the creation of disentangled representations simply by enforcing a factorized prior, a fact that has been observed empirically in recent work. Our experiments validate the theoretical intuitions behind our method, and we find that information dropout achieves a comparable or better generalization performance than binary dropout, especially on smaller models, since it can automatically adapt the noise to the structure of the network, as well as to the test sample.

READ FULL TEXT

page 6

page 11

research
11/21/2016

Generalized Dropout

Deep Neural Networks often require good regularizers to generalize well....
research
10/22/2020

The Role of Mutual Information in Variational Classifiers

Overfitting data is a well-known phenomenon related with the generation ...
research
10/30/2019

On the Regularization Properties of Structured Dropout

Dropout and its extensions (eg. DropBlock and DropConnect) are popular h...
research
02/14/2018

The Role of Information Complexity and Randomization in Representation Learning

A grand challenge in representation learning is to learn the different e...
research
04/17/2019

Sparseout: Controlling Sparsity in Deep Networks

Dropout is commonly used to help reduce overfitting in deep neural netwo...
research
12/04/2017

Data Dropout in Arbitrary Basis for Deep Network Regularization

An important problem in training deep networks with high capacity is to ...
research
09/19/2018

Removing the Feature Correlation Effect of Multiplicative Noise

Multiplicative noise, including dropout, is widely used to regularize de...

Please sign up or login with your details

Forgot password? Click here to reset