Exponentially Increasing the Capacity-to-Computation Ratio for Conditional Computation in Deep Learning

06/28/2014
by   Kyunghyun Cho, et al.
0

Many state-of-the-art results obtained with deep networks are achieved with the largest models that could be trained, and if more computation power was available, we might be able to exploit much larger datasets in order to improve generalization ability. Whereas in learning algorithms such as decision trees the ratio of capacity (e.g., the number of parameters) to computation is very favorable (up to exponentially more parameters than computation), the ratio is essentially 1 for deep neural networks. Conditional computation has been proposed as a way to increase the capacity of a deep neural network without increasing the amount of computation required, by activating some parameters and computation "on-demand", on a per-example basis. In this note, we propose a novel parametrization of weight matrices in neural networks which has the potential to increase up to exponentially the ratio of the number of parameters to computation. The proposed approach is based on turning on some parameters (weight matrices) when specific bit patterns of hidden unit activations are obtained. In order to better control for the overfitting that might result, we propose a parametrization that is tree-structured, where each node of the tree corresponds to a prefix of a sequence of sign bits, or gating units, associated with hidden units.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/24/2017

On the Compressive Power of Deep Rectifier Networks for High Resolution Representation of Class Boundaries

This paper provides a theoretical justification of the superior classifi...
research
01/23/2017

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

The capacity of a neural network to absorb information is limited by its...
research
07/28/2021

To Boost or not to Boost: On the Limits of Boosted Neural Networks

Boosting is a method for finding a highly accurate hypothesis by linearl...
research
03/03/2016

Decision Forests, Convolutional Networks and the Models in-Between

This paper investigates the connections between two state of the art cla...
research
02/18/2020

The Tree Ensemble Layer: Differentiability meets Conditional Computation

Neural networks and tree ensembles are state-of-the-art learners, each w...
research
05/22/2016

Factored Temporal Sigmoid Belief Networks for Sequence Learning

Deep conditional generative models are developed to simultaneously learn...
research
02/21/2008

Testing the number of parameters with multidimensional MLP

This work concerns testing the number of parameters in one hidden layer ...

Please sign up or login with your details

Forgot password? Click here to reset