Probing Multilingual Language Models for Discourse

06/09/2021
by   Murathan Kurfalı, et al.
0

Pre-trained multilingual language models have become an important building block in multilingual natural language processing. In the present paper, we investigate a range of such models to find out how well they transfer discourse-level knowledge across languages. This is done with a systematic evaluation on a broader set of discourse-level tasks than has been previously been assembled. We find that the XLM-RoBERTa family of models consistently show the best performance, by simultaneously being good monolingual models and degrading relatively little in a zero-shot setting. Our results also indicate that model distillation may hurt the ability of cross-lingual transfer of sentence representations, while language dissimilarity at most has a modest effect. We hope that our test suite, covering 5 tasks with a total of 22 languages in 10 distinct families, will serve as a useful evaluation platform for multilingual performance at and beyond the sentence level.

READ FULL TEXT
research
02/28/2022

Cross-Lingual Text Classification with Multilingual Distillation and Zero-Shot-Aware Training

Multilingual pre-trained language models (MPLMs) not only can handle tas...
research
06/04/2019

How multilingual is Multilingual BERT?

In this paper, we show that Multilingual BERT (M-BERT), released by Devl...
research
05/02/2023

MultiLegalSBD: A Multilingual Legal Sentence Boundary Detection Dataset

Sentence Boundary Detection (SBD) is one of the foundational building bl...
research
09/10/2021

Augmenting BERT-style Models with Predictive Coding to Improve Discourse-level Representations

Current language models are usually trained using a self-supervised sche...
research
08/15/2023

A User-Centered Evaluation of Spanish Text Simplification

We present an evaluation of text simplification (TS) in Spanish for a pr...
research
04/15/2021

XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation

Machine learning has brought striking advances in multilingual natural l...
research
12/16/2022

Lessons learned from the evaluation of Spanish Language Models

Given the impact of language models on the field of Natural Language Pro...

Please sign up or login with your details

Forgot password? Click here to reset