Word-Graph2vec: An efficient word embedding approach on word co-occurrence graph using random walk sampling

01/11/2023
by   Wenting Li, et al.
0

Word embedding has become ubiquitous and is widely used in various text mining and natural language processing (NLP) tasks, such as information retrieval, semantic analysis, and machine translation, among many others. Unfortunately, it is prohibitively expensive to train the word embedding in a relatively large corpus. We propose a graph-based word embedding algorithm, called Word-Graph2vec, which converts the large corpus into a word co-occurrence graph, then takes the word sequence samples from this graph by randomly traveling and trains the word embedding on this sampling corpus in the end. We posit that because of the stable vocabulary, relative idioms, and fixed expressions in English, the size and density of the word co-occurrence graph change slightly with the increase in the training corpus. So that Word-Graph2vec has stable runtime on the large scale data set, and its performance advantage becomes more and more obvious with the growth of the training corpus. Extensive experiments conducted on real-world datasets show that the proposed algorithm outperforms traditional Skip-Gram by four-five times in terms of efficiency, while the error generated by the random walk sampling is small.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/21/2016

Bayesian Neural Word Embedding

Recently, several works in the domain of natural language processing pre...
research
10/14/2022

One Graph to Rule them All: Using NLP and Graph Neural Networks to analyse Tolkien's Legendarium

Natural Language Processing and Machine Learning have considerably advan...
research
07/06/2019

TEAGS: Time-aware Text Embedding Approach to Generate Subgraphs

Contagions (e.g. virus, gossip) spread over the nodes in propagation gra...
research
03/01/2017

Frequency patterns of semantic change: Corpus-based evidence of a near-critical dynamics in language change

It is generally believed that, when a linguistic item acquires a new mea...
research
05/09/2018

LearningWord Embeddings for Low-resource Languages by PU Learning

Word embedding is a key component in many downstream applications in pro...
research
01/14/2016

Linear Algebraic Structure of Word Senses, with Applications to Polysemy

Word embeddings are ubiquitous in NLP and information retrieval, but it'...
research
03/31/2020

Enriching Consumer Health Vocabulary Using Enhanced GloVe Word Embedding

Open-Access and Collaborative Consumer Health Vocabulary (OAC CHV, or CH...

Please sign up or login with your details

Forgot password? Click here to reset