Cross-Lingual Word Embeddings for Turkic Languages

05/17/2020
by   Elmurod Kuriyozov, et al.
0

There has been an increasing interest in learning cross-lingual word embeddings to transfer knowledge obtained from a resource-rich language, such as English, to lower-resource languages for which annotated data is scarce, such as Turkish, Russian, and many others. In this paper, we present the first viability study of established techniques to align monolingual embedding spaces for Turkish, Uzbek, Azeri, Kazakh and Kyrgyz, members of the Turkic family which is heavily affected by the low-resource constraint. Those techniques are known to require little explicit supervision, mainly in the form of bilingual dictionaries, hence being easily adaptable to different domains, including low-resource ones. We obtain new bilingual dictionaries and new word embeddings for these languages and show the steps for obtaining cross-lingual word embeddings using state-of-the-art techniques. Then, we evaluate the results using the bilingual dictionary induction task. Our experiments confirm that the obtained bilingual dictionaries outperform previously-available ones, and that word embeddings from a low-resource language can benefit from resource-rich closely-related languages when they are aligned together. Furthermore, evaluation on an extrinsic task (Sentiment analysis on Uzbek) proves that monolingual word embeddings can, although slightly, benefit from cross-lingual alignments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2021

When Word Embeddings Become Endangered

Big languages such as English and Finnish have many natural language pro...
research
10/23/2020

Anchor-based Bilingual Word Embeddings for Low-Resource Languages

Bilingual word embeddings (BWEs) are useful for many cross-lingual appli...
research
11/08/2019

Should All Cross-Lingual Embeddings Speak English?

Most of recent work in cross-lingual word embeddings is severely Angloce...
research
07/06/2019

Best Practices for Learning Domain-Specific Cross-Lingual Embeddings

Cross-lingual embeddings aim to represent words in multiple languages in...
research
11/08/2019

Interactive Refinement of Cross-Lingual Word Embeddings

Cross-lingual word embeddings transfer knowledge between languages: mode...
research
03/28/2022

Isomorphic Cross-lingual Embeddings for Low-Resource Languages

Cross-Lingual Word Embeddings (CLWEs) are a key component to transfer li...
research
01/11/2016

Trans-gram, Fast Cross-lingual Word-embeddings

We introduce Trans-gram, a simple and computationally-efficient method t...

Please sign up or login with your details

Forgot password? Click here to reset