Should All Cross-Lingual Embeddings Speak English?

11/08/2019
by   Antonios Anastasopoulos, et al.
0

Most of recent work in cross-lingual word embeddings is severely Anglocentric. The vast majority of lexicon induction evaluation dictionaries are between English and another language, and the English embedding space is selected by default as the hub when learning in a multilingual setting. With this work, however, we challenge these practices. First, we show that the choice of hub language can significantly impact downstream lexicon induction performance. Second, we both expand the current evaluation dictionary collection to include all language pairs using triangulation, and also create new dictionaries for under-represented languages. Evaluating established methods over all these language pairs sheds light into their suitability and presents new challenges for the field. Finally, in our analysis we identify general guidelines for strong cross-lingual embeddings baselines, based on more than just Anglocentric experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2020

Cross-Lingual Word Embeddings for Turkic Languages

There has been an increasing interest in learning cross-lingual word emb...
research
09/12/2019

Lost in Evaluation: Misleading Benchmarks for Bilingual Dictionary Induction

The task of bilingual dictionary induction (BDI) is commonly used for in...
research
05/01/2020

Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries

Cross-lingual word embeddings (CLWE) are often evaluated on bilingual le...
research
02/01/2019

How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions

Cross-lingual word embeddings (CLEs) enable multilingual modeling of mea...
research
04/23/2018

Bilingual Embeddings with Random Walks over Multilingual Wordnets

Bilingual word embeddings represent words of two languages in the same s...
research
08/10/2018

Learning to Represent Bilingual Dictionaries

Bilingual word embeddings have been widely used to capture the similarit...
research
11/27/2019

Findings of the 2016 WMT Shared Task on Cross-lingual Pronoun Prediction

We describe the design, the evaluation setup, and the results of the 201...

Please sign up or login with your details

Forgot password? Click here to reset