Syntactic Interchangeability in Word Embedding Models

04/01/2019
by   Daniel Hershcovich, et al.
0

Nearest neighbors in word embedding models are commonly observed to be semantically similar, but the relations between them can vary greatly. We investigate the extent to which word embedding models preserve syntactic interchangeability, as reflected by distances between word vectors, and the effect of hyper-parameters---context window size in particular. We use part of speech (POS) as a proxy for syntactic interchangeability, as generally speaking, words with the same POS are syntactically valid in the same contexts. We also investigate the relationship between interchangeability and similarity as judged by commonly-used word similarity benchmarks, and correlate the result with the performance of word embedding models on these benchmarks. Our results will inform future research and applications in the selection of word embedding model, suggesting a principle for an appropriate selection of the context window size parameter depending on the use-case.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2016

Part-of-Speech Relevance Weights for Learning Word Embeddings

This paper proposes a model to learn word embeddings with weighted conte...
research
01/10/2017

Implicitly Incorporating Morphological Information into Word Embedding

In this paper, we propose three novel models to enhance word embedding b...
research
09/02/2020

On SkipGram Word Embedding Models with Negative Sampling: Unified Framework and Impact of Noise Distributions

SkipGram word embedding models with negative sampling, or SGN in short, ...
research
06/20/2016

Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Word embedding, specially with its recent developments, promises a quant...
research
04/19/2017

Redefining Context Windows for Word Embedding Models: An Experimental Study

Distributional semantic models learn vector representations of words thr...
research
01/25/2020

An Analysis of Word2Vec for the Italian Language

Word representation is fundamental in NLP tasks, because it is precisely...
research
12/05/2017

EmTaggeR: A Word Embedding Based Novel Method for Hashtag Recommendation on Twitter

The hashtag recommendation problem addresses recommending (suggesting) o...

Please sign up or login with your details

Forgot password? Click here to reset