Lost in Evaluation: Misleading Benchmarks for Bilingual Dictionary Induction

09/12/2019
by   Yova Kementchedjhieva, et al.
0

The task of bilingual dictionary induction (BDI) is commonly used for intrinsic evaluation of cross-lingual word embeddings. The largest dataset for BDI was generated automatically, so its quality is dubious. We study the composition and quality of the test sets for five diverse languages from this dataset, with concerning findings: (1) a quarter of the data consists of proper nouns, which can be hardly indicative of BDI performance, and (2) there are pervasive gaps in the gold-standard targets. These issues appear to affect the ranking between cross-lingual embedding systems on individual languages, and the overall degree to which the systems differ in performance. With proper nouns removed from the data, the margin between the top two systems included in the study grows from 3.4 the other hand, reveals that gaps in the gold standard targets artificially inflate the margin between the two systems on English to Bulgarian BDI from 0.1 conclusions from quantitative results on this BDI dataset, or accompanies such evaluation with rigorous error analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2019

Should All Cross-Lingual Embeddings Speak English?

Most of recent work in cross-lingual word embeddings is severely Angloce...
research
09/30/2020

BERT for Monolingual and Cross-Lingual Reverse Dictionary

Reverse dictionary is the task to find the proper target word given the ...
research
09/04/2019

Do We Really Need Fully Unsupervised Cross-Lingual Embeddings?

Recent efforts in cross-lingual word embedding (CLWE) learning have pred...
research
07/21/2017

Cross-Lingual Induction and Transfer of Verb Classes Based on Word Vector Space Specialisation

Existing approaches to automatic VerbNet-style verb classification are h...
research
11/16/2017

Addressing Cross-Lingual Word Sense Disambiguation on Low-Density Languages: Application to Persian

We explore the use of unsupervised methods in Cross-Lingual Word Sense D...
research
11/04/2016

Automated Generation of Multilingual Clusters for the Evaluation of Distributed Representations

We propose a language-agnostic way of automatically generating sets of s...

Please sign up or login with your details

Forgot password? Click here to reset