Larger-Scale Transformers for Multilingual Masked Language Modeling

05/02/2021
by   Naman Goyal, et al.
0

Recent work has demonstrated the effectiveness of cross-lingual language model pretraining for cross-lingual understanding. In this study, we present the results of two larger multilingual masked language models, with 3.5B and 10.7B parameters. Our two new models dubbed XLM-R XL and XLM-R XXL outperform XLM-R by 1.8 RoBERTa-Large model on several English tasks of the GLUE benchmark by 0.3 average while handling 99 more languages. This suggests pretrained models with larger capacity may obtain both strong performance on high-resource languages while greatly improving low-resource languages. We make our code and models publicly available.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset