AI Chat AI Image Generator AI Video Text to Speech

Development of Word Embeddings for Uzbek Language

09/30/2020

∙

by B. Mansurov, et al.

∙

∙

In this paper, we share the process of developing word embeddings for the Cyrillic variant of the Uzbek language. The result of our work is the first publicly available set of word vectors trained on the word2vec, GloVe, and fastText algorithms using a high-quality web crawl corpus developed in-house. The developed word embeddings can be used in many natural language processing downstream tasks.

B. Mansurov
3 publications
A. Mansurov
3 publications

page 1

page 2

page 3

page 4

research

∙ 05/05/2021

Evaluation Of Word Embeddings From Large-Scale French Web Content

Distributed word representations are popularly used in many tasks in nat...

0 Hadi Abdine, et al. ∙

research

∙ 04/10/2019

Better Word Embeddings by Disentangling Contextual n-Gram Information

Pre-trained word vectors are ubiquitous in Natural Language Processing a...

0 Prakhar Gupta, et al. ∙

research

∙ 06/17/2019

KaWAT: A Word Analogy Task Dataset for Indonesian

We introduced KaWAT (Kata Word Analogy Task), a new word analogy task da...

0 Kemal Kurniawan, et al. ∙

research

∙ 10/24/2020

Word Embeddings for Chemical Patent Natural Language Processing

We evaluate chemical patent word embeddings against known biomedical emb...

0 Camilo Thorne, et al. ∙

research

∙ 05/04/2022

Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem

Word embeddings are one of the most fundamental technologies used in nat...

0 Ryoma Sato, et al. ∙

research

∙ 11/28/2019

A New Corpus for Low-Resourced Sindhi Language with Word Embeddings

Representing words and phrases into dense vectors of real numbers which ...

0 Wazir Ali, et al. ∙

research

∙ 10/05/2021

A Survey On Neural Word Embeddings

Understanding human language has been a sub-challenge on the way of inte...

0 Erhan Sezerer, et al. ∙