Multilingual BERT Post-Pretraining Alignment

10/23/2020

∙

We propose a simple method to align multilingual contextual embeddings as a post-pretraining step for improved zero-shot cross-lingual transferability of the pretrained models. Using parallel data, our method aligns embeddings on the word level through the recently proposed Translation Language Modeling objective as well as on the sentence level via contrastive learning and random input shuffling. We also perform code-switching with English when finetuning on downstream tasks. On XNLI, our best model (initialized from mBERT) improves over mBERT by 4.7 XLM for translate-train while using less than 18 31 57

READ FULL TEXT

Multilingual BERT Post-Pretraining Alignment

Sign in with Google

Consider DeepAI Pro