A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations

09/10/2021
by   ZiYi Yang, et al.
5

Language agnostic and semantic-language information isolation is an emerging research direction for multilingual representations models. We explore this problem from a novel angle of geometric algebra and semantic space. A simple but highly effective method "Language Information Removal (LIR)" factors out language identity information from semantic related components in multilingual representations pre-trained on multi-monolingual data. A post-training and model-agnostic method, LIR only uses simple linear operations, e.g. matrix factorization and orthogonal projection. LIR reveals that for weak-alignment multilingual systems, the principal components of semantic spaces primarily encodes language identity information. We first evaluate the LIR on a cross-lingual question answer retrieval task (LAReQA), which requires the strong alignment for the multilingual embedding space. Experiment shows that LIR is highly effectively on this task, yielding almost 100 improvement in MAP for weak-alignment models. We then evaluate the LIR on Amazon Reviews and XEVAL dataset, with the observation that removing language information is able to improve the cross-lingual transfer performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2020

LAReQA: Language-agnostic answer retrieval from a multilingual pool

We present LAReQA, a challenging new benchmark for language-agnostic ans...
research
07/19/2022

Multilingual Transformer Encoders: a Word-Level Task-Agnostic Evaluation

Some Transformer-based models can perform cross-lingual transfer learnin...
research
06/01/2023

Exploring Anisotropy and Outliers in Multilingual Language Models for Cross-Lingual Semantic Sentence Similarity

Previous work has shown that the representations output by contextual la...
research
05/02/2020

Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer

Multilingual representations embed words from many languages into a sing...
research
08/20/2020

Inducing Language-Agnostic Multilingual Representations

Multilingual representations have the potential to make cross-lingual sy...
research
07/10/2017

Identity Alignment by Noisy Pixel Removal

Identity alignment models assume precisely annotated images manually. Hu...
research
05/19/2020

Adversarial Alignment of Multilingual Models for Extracting Temporal Expressions from Text

Although temporal tagging is still dominated by rule-based systems, ther...

Please sign up or login with your details

Forgot password? Click here to reset