Specializing Multilingual Language Models: An Empirical Study

06/16/2021
by   Ethan C. Chau, et al.
0

Contextualized word representations from pretrained multilingual language models have become the de facto standard for addressing natural language tasks in many different languages, but the success of this approach is far from universal. For languages rarely or never seen by these models, directly using such models often results in suboptimal representation or use of data, motivating additional model adaptations to achieve reasonably strong performance. In this work, we study the performance, extensibility, and interaction of two such adaptations for this low-resource setting: vocabulary augmentation and script transliteration. Our evaluations on a set of three tasks in nine diverse low-resource languages yield a mixed result, upholding the viability of these approaches while raising new questions around how to optimally adapt multilingual models to low-resource settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2021

Low-Resource Language Modelling of South African Languages

Language models are the foundation of current neural network-based model...
research
02/20/2019

Phoneme Level Language Models for Sequence Based Low Resource ASR

Building multilingual and crosslingual models help bring different langu...
research
04/26/2018

Lessons from the Bible on Modern Topics: Low-Resource Multilingual Topic Model Evaluation

Multilingual topic models enable document analysis across languages thro...
research
12/30/2019

An Empirical Study of Factors Affecting Language-Independent Models

Scaling existing applications and solutions to multiple human languages ...
research
09/02/2023

Multilingual Text Representation

Modern NLP breakthrough includes large multilingual models capable of pe...
research
04/22/2022

A Vocabulary-Free Multilingual Neural Tokenizer for End-to-End Task Learning

Subword tokenization is a commonly used input pre-processing step in mos...
research
09/29/2020

Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank

Pretrained multilingual contextual representations have shown great succ...

Please sign up or login with your details

Forgot password? Click here to reset