KaWAT: A Word Analogy Task Dataset for Indonesian

06/17/2019
by   Kemal Kurniawan, et al.
0

We introduced KaWAT (Kata Word Analogy Task), a new word analogy task dataset for Indonesian. We evaluated on it several existing pretrained Indonesian word embeddings and embeddings trained on Indonesian online news corpus. We also tested them on two downstream tasks and found that pretrained word embeddings helped either by reducing the training epochs or yielding significant performance gains.

READ FULL TEXT

page 1

page 2

page 3

research
09/30/2020

Development of Word Embeddings for Uzbek Language

In this paper, we share the process of developing word embeddings for th...
research
11/05/2019

Incremental Sense Weight Training for the Interpretation of Contextualized Word Embeddings

We present a novel online algorithm that learns the essence of each dime...
research
04/16/2018

A Deeper Look into Dependency-Based Word Embeddings

We investigate the effect of various dependency-based word embeddings on...
research
01/22/2019

Delta-training: Simple Semi-Supervised Text Classification using Pretrained Word Embeddings

We propose a novel and simple method for semi-supervised text classifica...
research
04/16/2021

Word2rate: training and evaluating multiple word embeddings as statistical transitions

Using pretrained word embeddings has been shown to be a very effective w...
research
05/18/2020

Contextual Embeddings: When Are They Worth It?

We study the settings for which deep contextual embeddings (e.g., BERT) ...
research
11/06/2019

Invariance and identifiability issues for word embeddings

Word embeddings are commonly obtained as optimizers of a criterion funct...

Please sign up or login with your details

Forgot password? Click here to reset