Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency

08/30/2023
by   Ziming Liu, et al.
0

Large-scale language models have become increasingly challenging and expensive to train. Among various methods addressing this issue, Pipeline Parallelism has been widely employed to accommodate massive model weights within limited GPU memory. This paper introduces Hanayo, a wave-like pipeline parallelism strategy that boasts a concise structure and practical applicability, alongside a high-performance pipeline execution runtime to tackle the challenges of pipeline strategy implementation. Hanayo mitigates the issues of pipeline bubbles and excessive memory consumption prevalent in existing schemes, without resorting to model duplicates as in Chimera. Our evaluation, conducted on four distinct computing clusters and involving both GPT-like and BERT-like architectures with up to 32 GPUs, demonstrates up to a 30.4 % increase in throughput compared to the state-of-the-art approach.

READ FULL TEXT

page 5

page 7

page 9

page 10

research
04/09/2021

Efficient Large-Scale Language Model Training on GPU Clusters

Large language models have led to state-of-the-art accuracies across a r...
research
07/02/2020

DAPPLE: A Pipelined Data Parallel Approach for Training Large Models

It is a challenging task to train large DNN models on sophisticated GPU ...
research
07/14/2021

Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines

Training large deep learning models at scale is very challenging. This p...
research
06/16/2020

Memory-Efficient Pipeline-Parallel DNN Training

Many state-of-the-art results in domains such as NLP and computer vision...
research
03/03/2023

Ada-Grouper: Accelerating Pipeline Parallelism in Preempted Network by Adaptive Group-Scheduling for Micro-Batches

Pipeline parallelism has been demonstrated to be a remarkable approach t...
research
04/22/2023

Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism

The Mixture of Experts (MoE) model becomes an important choice of large ...
research
11/10/2021

Amazon SageMaker Model Parallelism: A General and Flexible Framework for Large Model Training

With deep learning models rapidly growing in size, systems-level solutio...

Please sign up or login with your details

Forgot password? Click here to reset