Development of Word Embeddings for Uzbek Language

09/30/2020
by   B. Mansurov, et al.
0

In this paper, we share the process of developing word embeddings for the Cyrillic variant of the Uzbek language. The result of our work is the first publicly available set of word vectors trained on the word2vec, GloVe, and fastText algorithms using a high-quality web crawl corpus developed in-house. The developed word embeddings can be used in many natural language processing downstream tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2021

Evaluation Of Word Embeddings From Large-Scale French Web Content

Distributed word representations are popularly used in many tasks in nat...
research
04/10/2019

Better Word Embeddings by Disentangling Contextual n-Gram Information

Pre-trained word vectors are ubiquitous in Natural Language Processing a...
research
06/17/2019

KaWAT: A Word Analogy Task Dataset for Indonesian

We introduced KaWAT (Kata Word Analogy Task), a new word analogy task da...
research
10/24/2020

Word Embeddings for Chemical Patent Natural Language Processing

We evaluate chemical patent word embeddings against known biomedical emb...
research
05/04/2022

Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem

Word embeddings are one of the most fundamental technologies used in nat...
research
11/28/2019

A New Corpus for Low-Resourced Sindhi Language with Word Embeddings

Representing words and phrases into dense vectors of real numbers which ...
research
10/05/2021

A Survey On Neural Word Embeddings

Understanding human language has been a sub-challenge on the way of inte...

Please sign up or login with your details

Forgot password? Click here to reset