Multilingual Text Representation

09/02/2023
by   Fahim Faisal, et al.
0

Modern NLP breakthrough includes large multilingual models capable of performing tasks across more than 100 languages. State-of-the-art language models came a long way, starting from the simple one-hot representation of words capable of performing tasks like natural language understanding, common-sense reasoning, or question-answering, thus capturing both the syntax and semantics of texts. At the same time, language models are expanding beyond our known language boundary, even competitively performing over very low-resource dialects of endangered languages. However, there are still problems to solve to ensure an equitable representation of texts through a unified modeling space across language and speakers. In this survey, we shed light on this iterative progression of multilingual text representation and discuss the driving factors that ultimately led to the current state-of-the-art. Subsequently, we discuss how the full potential of language democratization could be obtained, reaching beyond the known limits and what is the scope of improvement in that space.

READ FULL TEXT
research
06/16/2021

Specializing Multilingual Language Models: An Empirical Study

Contextualized word representations from pretrained multilingual languag...
research
03/23/2023

Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages

While code-mixing is a common linguistic practice in many parts of the w...
research
09/14/2023

Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?

Large Language Models (LLMs) have demonstrated impressive performance on...
research
03/22/2023

MEGA: Multilingual Evaluation of Generative AI

Generative AI models have impressive performance on many Natural Languag...
research
01/25/2023

XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models

Large multilingual language models typically rely on a single vocabulary...
research
05/11/2023

Autocorrelations Decay in Texts and Applicability Limits of Language Models

We show that the laws of autocorrelations decay in texts are closely rel...
research
09/24/2021

Text-based NP Enrichment

Understanding the relations between entities denoted by NPs in text is a...

Please sign up or login with your details

Forgot password? Click here to reset