Scaling Expert Language Models with Unsupervised Domain Discovery

03/24/2023
by   Suchin Gururangan, et al.
0

Large language models are typically trained densely: all parameters are updated with respect to all inputs. This requires synchronization of billions of parameters across thousands of GPUs. We introduce a simple but effective method to asynchronously train large, sparse language models on arbitrary text corpora. Our method clusters a corpus into sets of related documents, trains a separate expert language model on each cluster, and combines them in a sparse ensemble for inference. This approach generalizes embarrassingly parallel training by automatically discovering the domains for each expert, and eliminates nearly all the communication overhead of existing sparse language models. Our technique outperforms dense baselines on multiple corpora and few-shot tasks, and our analysis shows that specializing experts to meaningful clusters is key to these gains. Performance also improves with the number of experts and size of training data, suggesting this is a highly efficient and accessible approach to training large language models.

READ FULL TEXT
research
12/13/2021

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

Scaling language models with more data, compute and parameters has drive...
research
08/05/2022

Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

We present Branch-Train-Merge (BTM), a communication-efficient algorithm...
research
04/11/2023

ChemCrow: Augmenting large-language models with chemistry tools

Large-language models (LLMs) have recently shown strong performance in t...
research
07/22/2023

Optimized Network Architectures for Large Language Model Training with Billions of Parameters

This paper challenges the well-established paradigm for building any-to-...
research
08/28/2023

EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models

Large Language Models (LLMs) such as GPTs and LLaMa have ushered in a re...
research
03/31/2023

Dense Sparse Retrieval: Using Sparse Language Models for Inference Efficient Dense Retrieval

Vector-based retrieval systems have become a common staple for academic ...
research
04/11/2023

Training Large Language Models Efficiently with Sparsity and Dataflow

Large foundation language models have shown their versatility in being a...

Please sign up or login with your details

Forgot password? Click here to reset