ETNLP: A Toolkit for Extraction, Evaluation and Visualization of Pre-trained Word Embeddings

03/11/2019
by   Xuan-Son Vu, et al.
0

In this paper, we introduce a comprehensive toolkit, ETNLP, which can evaluate, extract, and visualize multiple sets of pre-trained word embeddings. First, for evaluation, ETNLP analyses the quality of pre-trained embeddings based on an input word analogy list. Second, for extraction ETNLP provides a subset of the embeddings to be used in the downstream NLP tasks. Finally, ETNLP has a visualization module which is for exploring the embedded words interactively. We demonstrate the effectiveness of ETNLP on our pre-trained word embeddings in Vietnamese. Specifically, we create a large Vietnamese word analogy list to evaluate the embeddings. We then utilize the pre-trained embeddings for the name entity recognition (NER) task in Vietnamese and achieve the new state-of-the-art results on a benchmark dataset for the NER task. A video demonstration of ETNLP is available at https://vimeo.com/317599106. The source code and data are available at https: //github.com/vietnlp/etnlp.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2020

AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages

We present the IndicNLP corpus, a large-scale, general-domain corpus con...
research
05/12/2023

IMAGINATOR: Pre-Trained Image+Text Joint Embeddings using Word-Level Grounding of Images

Word embeddings, i.e., semantically meaningful vector representation of ...
research
08/06/2023

3D-EX : A Unified Dataset of Definitions and Dictionary Examples

Definitions are a fundamental building block in lexicography, linguistic...
research
11/19/2020

Exploring Text Specific and Blackbox Fairness Algorithms in Multimodal Clinical NLP

Clinical machine learning is increasingly multimodal, collected in both ...
research
10/05/2017

BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages

We present BPEmb, a collection of pre-trained subword unit embeddings in...
research
01/25/2021

PolyLM: Learning about Polysemy through Language Modeling

To avoid the "meaning conflation deficiency" of word embeddings, a numbe...
research
02/06/2023

Efficient and Flexible Topic Modeling using Pretrained Embeddings and Bag of Sentences

Pre-trained language models have led to a new state-of-the-art in many N...

Please sign up or login with your details

Forgot password? Click here to reset