On the Multilingual Capabilities of Very Large-Scale English Language Models

08/30/2021
by   Jordi Armengol-Estapé, et al.
0

Generative Pre-trained Transformers (GPTs) have recently been scaled to unprecedented sizes in the history of machine learning. These models, solely trained on the language modeling objective, have been shown to exhibit outstanding few-shot learning capabilities in a number of different tasks. Nevertheless, aside from anecdotal experiences, little is known regarding their multilingual capabilities, given the fact that the pre-training corpus is almost entirely composed of English text. In this work, we investigate the multilingual skills of GPT-3, focusing on one language that barely appears in the pre-training corpus, Catalan, which makes the results especially meaningful; we assume that our results may be relevant for other languages as well. We find that the model shows an outstanding performance, particularly in generative tasks, with predictable limitations mostly in language understanding tasks but still with remarkable results given the zero-shot scenario. We investigate its potential and limits in extractive question-answering and natural language generation, as well as the effect of scale in terms of model size.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/04/2022

Applying Multilingual Models to Question Answering (QA)

We study the performance of monolingual and multilingual language models...
research
08/02/2022

AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

In this work, we demonstrate that multilingual large-scale sequence-to-s...
research
10/27/2022

What Language Model to Train if You Have One Million GPU Hours?

The crystallization of modeling methods around the Transformer architect...
research
09/10/2021

What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

GPT-3 shows remarkable in-context learning ability of large-scale langua...
research
12/03/2019

Unsupervised Inflection Generation Using Neural Language Modeling

The use of Deep Neural Network architectures for Language Modeling has r...
research
05/23/2023

mmT5: Modular Multilingual Pre-Training Solves Source Language Hallucinations

Multilingual sequence-to-sequence models perform poorly with increased l...
research
04/07/2022

BERTuit: Understanding Spanish language in Twitter through a native transformer

The appearance of complex attention-based language models such as BERT, ...

Please sign up or login with your details

Forgot password? Click here to reset