Ouroboros: On Accelerating Training of Transformer-Based Language Models

09/14/2019
by   Qian Yang, et al.
0

Language models are essential for natural language processing (NLP) tasks, such as machine translation and text summarization. Remarkable performance has been demonstrated recently across many NLP domains via a Transformer-based language model with over a billion parameters, verifying the benefits of model size. Model parallelism is required if a model is too large to fit in a single computing device. Current methods for model parallelism either suffer from backward locking in backpropagation or are not applicable to language models. We propose the first model-parallel algorithm that speeds the training of Transformer-based language models. We also prove that our proposed algorithm is guaranteed to converge to critical points for non-convex problems. Extensive experiments on Transformer and Transformer-XL language models demonstrate that the proposed algorithm obtains a much faster speedup beyond data parallelism, with comparable or better accuracy. Code to reproduce experiments is to be found at <https://github.com/LaraQianYang/Ouroboros>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/26/2020

Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping

Recently, Transformer-based language models have demonstrated remarkable...
research
10/06/2021

How BPE Affects Memorization in Transformers

Training data memorization in NLP can both be beneficial (e.g., closed-b...
research
04/07/2022

Transformer-Based Language Models for Software Vulnerability Detection: Performance, Model's Security and Platforms

The large transformer-based language models demonstrate excellent perfor...
research
10/30/2019

Contextual Text Denoising with Masked Language Models

Recently, with the help of deep learning models, significant advances ha...
research
08/05/2022

Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

We present Branch-Train-Merge (BTM), a communication-efficient algorithm...
research
02/08/2023

Revisiting Offline Compression: Going Beyond Factorization-based Methods for Transformer Language Models

Recent transformer language models achieve outstanding results in many n...
research
09/02/2023

LinkTransformer: A Unified Package for Record Linkage with Transformer Language Models

Linking information across sources is fundamental to a variety of analys...

Please sign up or login with your details

Forgot password? Click here to reset