Multilingual Culture-Independent Word Analogy Datasets

11/22/2019
by   Matej Ulčar, et al.
0

In text processing, deep neural networks mostly use word embeddings as an input. Embeddings have to ensure that relations between words are reflected through distances in a high-dimensional numeric space. To compare the quality of different text embeddings, typically, we use benchmark datasets. We present a collection of such datasets for the word analogy task in nine languages: Croatian, English, Estonian, Finnish, Latvian, Lithuanian, Russian, Slovenian, and Swedish. We redesigned the original monolingual analogy task to be culturally independent and also constructed cross-lingual analogy datasets for the involved languages. We present basic statistics of the created datasets and their initial evaluation using fastText embeddings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2018

Unsupervised Cross-lingual Word Embedding by Multilingual Neural Language Models

We propose an unsupervised method to obtain cross-lingual embeddings wit...
research
10/16/2019

Meemi: A Simple Method for Post-processing Cross-lingual Word Embeddings

Word embeddings have become a standard resource in the toolset of any Na...
research
11/09/2016

A Comparison of Word Embeddings for English and Cross-Lingual Chinese Word Sense Disambiguation

Word embeddings are now ubiquitous forms of word representation in natur...
research
12/05/2019

Massive vs. Curated Word Embeddings for Low-Resourced Languages. The Case of Yorùbá and Twi

The success of several architectures to learn semantic representations f...
research
05/19/2023

Persian Typographical Error Type Detection using Many-to-Many Deep Neural Networks on Algorithmically-Generated Misspellings

Digital technologies have led to an influx of text created daily in a va...
research
08/06/2021

Transferring Knowledge Distillation for Multilingual Social Event Detection

Recently published graph neural networks (GNNs) show promising performan...
research
10/22/2020

On the Effects of Using word2vec Representations in Neural Networks for Dialogue Act Recognition

Dialogue act recognition is an important component of a large number of ...

Please sign up or login with your details

Forgot password? Click here to reset