ResIST: Layer-Wise Decomposition of ResNets for Distributed Training

07/02/2021
by   Chen Dun, et al.
0

We propose , a novel distributed training protocol for Residual Networks (ResNets). randomly decomposes a global ResNet into several shallow sub-ResNets that are trained independently in a distributed manner for several local iterations, before having their updates synchronized and aggregated into the global model. In the next round, new sub-ResNets are randomly generated and the process repeats. By construction, per iteration, communicates only a small portion of network parameters to each machine and never uses the full model during training. Thus, reduces the communication, memory, and time requirements of ResNet training to only a fraction of the requirements of previous methods. In comparison to common protocols like data-parallel training and data-parallel training with local SGD, yields a decrease in wall-clock training time, while being competitive with respect to model performance.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro