A block-random algorithm for learning on distributed, heterogeneous data

by   Prakash Mohan, et al.

Most deep learning models are based on deep neural networks with multiple layers between input and output. The parameters defining these layers are initialized using random values and are "learned" from data, typically using stochastic gradient descent based algorithms. These algorithms rely on data being randomly shuffled before optimization. The randomization of the data prior to processing in batches that is formally required for stochastic gradient descent algorithm to effectively derive a useful deep learning model is expected to be prohibitively expensive for in situ model training because of the resulting data communications across the processor nodes. We show that the stochastic gradient descent (SGD) algorithm can still make useful progress if the batches are defined on a per-processor basis and processed in random order even though (i) the batches are constructed from data samples from a single class or specific flow region, and (ii) the overall data samples are heterogeneous. We present block-random gradient descent, a new algorithm that works on distributed, heterogeneous data without having to pre-shuffle. This algorithm enables in situ learning for exascale simulations. The performance of this algorithm is demonstrated on a set of benchmark classification models and the construction of a subgrid scale large eddy simulations (LES) model for turbulent channel flow using a data model similar to that which will be encountered in exascale simulation.


Non-convergence of stochastic gradient descent in the training of deep neural networks

Deep neural networks have successfully been trained in various applicati...

Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation

In spite of the accomplishments of deep learning based algorithms in num...

Analyzing the benefits of communication channels between deep learning models

As artificial intelligence systems spread to more diverse and larger tas...

What Can Machine Learning Teach Us about Communications?

Rapid improvements in machine learning over the past decade are beginnin...

Unifying the Stochastic Spectral Descent for Restricted Boltzmann Machines with Bernoulli or Gaussian Inputs

Stochastic gradient descent based algorithms are typically used as the g...

Deep Learning: Computational Aspects

In this article we review computational aspects of Deep Learning (DL). D...

Optimizing Pipelined Computation and Communication for Latency-Constrained Edge Learning

Consider a device that is connected to an edge processor via a communica...

Please sign up or login with your details

Forgot password? Click here to reset