Towards Fully Bilingual Deep Language Modeling

by   Li-Hsin Chang, et al.

Language models based on deep neural networks have facilitated great advances in natural language processing and understanding tasks in recent years. While models covering a large number of languages have been introduced, their multilinguality has come at a cost in terms of monolingual performance, and the best-performing models at most tasks not involving cross-lingual transfer remain monolingual. In this paper, we consider the question of whether it is possible to pre-train a bilingual model for two remotely related languages without compromising performance at either language. We collect pre-training data, create a Finnish-English bilingual BERT model and evaluate its performance on datasets used to evaluate the corresponding monolingual models. Our bilingual model performs on par with Google's original English BERT on GLUE and nearly matches the performance of monolingual Finnish BERT on a range of Finnish NLP tasks, clearly outperforming multilingual BERT. We find that when the model vocabulary size is increased, the BERT-Base architecture has sufficient capacity to learn two remotely related languages to a level where it achieves comparable performance with monolingual models, demonstrating the feasibility of training fully bilingual deep language models. The model and all tools involved in its creation are freely available at


page 1

page 2

page 3

page 4


WikiBERT models: deep transfer learning for many languages

Deep neural language models such as BERT have enabled substantial recent...

Evaluation of contextual embeddings on less-resourced languages

The current dominance of deep neural networks in natural language proces...

Large-Scale Contextualised Language Modelling for Norwegian

We present the ongoing NorLM initiative to support the creation and use ...

Evaluating Transferability of BERT Models on Uralic Languages

Transformer-based language models such as BERT have outperformed previou...

Graecia capta ferum victorem cepit. Detecting Latin Allusions to Ancient Greek Literature

Intertextual allusions hold a pivotal role in Classical Philology, with ...

BERT Can See Out of the Box: On the Cross-modal Transferability of Text Representations

Pre-trained language models such as BERT have recently contributed to si...

Ensembling Transformers for Cross-domain Automatic Term Extraction

Automatic term extraction plays an essential role in domain language und...

Please sign up or login with your details

Forgot password? Click here to reset