On Evaluation of Bangla Word Analogies

04/10/2023
by   Mousumi Akter, et al.
0

This paper presents a high-quality dataset for evaluating the quality of Bangla word embeddings, which is a fundamental task in the field of Natural Language Processing (NLP). Despite being the 7th most-spoken language in the world, Bangla is a low-resource language and popular NLP models fail to perform well. Developing a reliable evaluation test set for Bangla word embeddings are crucial for benchmarking and guiding future research. We provide a Mikolov-style word analogy evaluation set specifically for Bangla, with a sample size of 16678, as well as a translated and curated version of the Mikolov dataset, which contains 10594 samples for cross-lingual research. Our experiments with different state-of-the-art embedding models reveal that Bangla has its own unique characteristics, and current embeddings for Bangla still struggle to achieve high accuracy on both datasets. We suggest that future research should focus on training models with larger datasets and considering the unique morphological characteristics of Bangla. This study represents the first step towards building a reliable NLP system for the Bangla language1.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2019

Evaluation of Greek Word Embeddings

Since word embeddings have been the most popular input for many NLP task...
research
11/09/2016

A Comparison of Word Embeddings for English and Cross-Lingual Chinese Word Sense Disambiguation

Word embeddings are now ubiquitous forms of word representation in natur...
research
03/24/2021

When Word Embeddings Become Endangered

Big languages such as English and Finnish have many natural language pro...
research
11/08/2019

Interactive Refinement of Cross-Lingual Word Embeddings

Cross-lingual word embeddings transfer knowledge between languages: mode...
research
10/27/2021

Training Verifiers to Solve Math Word Problems

State-of-the-art language models can match human performance on many tas...
research
09/18/2020

FarsTail: A Persian Natural Language Inference Dataset

Natural language inference (NLI) is known as one of the central tasks in...
research
03/12/2021

Are NLP Models really able to Solve Simple Math Word Problems?

The problem of designing NLP solvers for math word problems (MWP) has se...

Please sign up or login with your details

Forgot password? Click here to reset