Low-resource bilingual lexicon extraction using graph based word embeddings

10/06/2017
by   Ximena Gutierrez-Vasques, et al.
0

In this work we focus on the task of automatically extracting bilingual lexicon for the language pair Spanish-Nahuatl. This is a low-resource setting where only a small amount of parallel corpus is available. Most of the downstream methods do not work well under low-resources conditions. This is specially true for the approaches that use vectorial representations like Word2Vec. Our proposal is to construct bilingual word vectors from a graph. This graph is generated using translation pairs obtained from an unsupervised word alignment method. We show that, in a low-resource setting, these type of vectors are successful in representing words in a bilingual semantic space. Moreover, when a linear transformation is applied to translate words from one language to another, our graph based representations considerably outperform the popular setting that uses Word2Vec.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2019

A Resource-Free Evaluation Metric for Cross-Lingual Word Embeddings Based on Graph Modularity

Cross-lingual word embeddings encode the meaning of words from different...
research
10/23/2020

Anchor-based Bilingual Word Embeddings for Low-Resource Languages

Bilingual word embeddings (BWEs) are useful for many cross-lingual appli...
research
08/31/2017

Transfer Learning across Low-Resource, Related Languages for Neural Machine Translation

We present a simple method to improve neural translation of a low-resour...
research
06/22/2020

Dirichlet-Smoothed Word Embeddings for Low-Resource Settings

Nowadays, classical count-based word embeddings using positive pointwise...
research
06/09/2022

Predicting Embedding Reliability in Low-Resource Settings Using Corpus Similarity Measures

This paper simulates a low-resource setting across 17 languages in order...
research
09/26/2021

An Analysis of Euclidean vs. Graph-Based Framing for Bilingual Lexicon Induction from Word Embedding Spaces

Much recent work in bilingual lexicon induction (BLI) views word embeddi...
research
04/22/2020

R-VGAE: Relational-variational Graph Autoencoder for Unsupervised Prerequisite Chain Learning

The task of concept prerequisite chain learning is to automatically dete...

Please sign up or login with your details

Forgot password? Click here to reset