Investigating Bi-LSTM and CRF with POS Tag Embedding for Indonesian Named Entity Tagger

09/11/2020
by   Devin Hoesen, et al.
0

Researches on Indonesian named entity (NE) tagger have been conducted since years ago. However, most did not use deep learning and instead employed traditional machine learning algorithms such as association rule, support vector machine, random forest, naïve bayes, etc. In those researches, word lists as gazetteers or clue words were provided to enhance the accuracy. Here, we attempt to employ deep learning in our Indonesian NE tagger. We use long short-term memory (LSTM) as the topology since it is the state-of-the-art of NE tagger. By using LSTM, we do not need a word list in order to enhance the accuracy. Basically, there are two main things that we investigate. The first is the output layer of the network: Softmax vs conditional random field (CRF). The second is the usage of part of speech (POS) tag embedding input layer. Using 8400 sentences as the training data and 97 sentences as the evaluation data, we find that using POS tag embedding as additional input improves the performance of our Indonesian NE tagger. As for the comparison between Softmax and CRF, we find that both architectures have a weakness in classifying an NE tag.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/09/2015

Bidirectional LSTM-CRF Models for Sequence Tagging

In this paper, we propose a variety of Long Short-Term Memory (LSTM) bas...
research
09/11/2019

Comprehensive Analysis of Aspect Term Extraction Methods using Various Text Embeddings

Recently, a variety of model designs and methods have blossomed in the c...
research
09/12/2020

Relation Detection for Indonesian Language using Deep Neural Network – Support Vector Machine

Relation Detection is a task to determine whether two entities are relat...
research
10/09/2020

Constrained Decoding for Computationally Efficient Named Entity Recognition Taggers

Current state-of-the-art models for named entity recognition (NER) are n...
research
08/11/2017

Unified Neural Architecture for Drug, Disease and Clinical Entity Recognition

Most existing methods for biomedical entity recognition task rely on exp...
research
09/29/2019

Language-Agnostic Syllabification with Neural Sequence Labeling

The identification of syllables within phonetic sequences is known as sy...
research
04/29/2018

A Tree Search Algorithm for Sequence Labeling

In this paper we propose a novel reinforcement learning based model for ...

Please sign up or login with your details

Forgot password? Click here to reset