Word Sense Disambiguation for 158 Languages using Word Embeddings Only

03/14/2020
by   Varvara Logacheva, et al.
0

Disambiguation of word senses in context is easy for humans, but is a major challenge for automatic approaches. Sophisticated supervised and knowledge-based models were developed to solve this task. However, (i) the inherent Zipfian distribution of supervised training instances for a given word and/or (ii) the quality of linguistic knowledge representations motivate the development of completely unsupervised and knowledge-free approaches to word sense disambiguation (WSD). They are particularly useful for under-resourced languages which do not have any resources for building either supervised and/or knowledge-based models. In this paper, we present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory, which can be used for disambiguation in context. We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings by Grave et al. (2018), enabling WSD in these languages. Models and system are available online.

READ FULL TEXT

page 5

page 6

research
12/27/2021

"A Passage to India": Pre-trained Word Embeddings for Indian Languages

Dense word vectors or 'word embeddings' which encode semantic properties...
research
01/23/2021

Dictionary-based Debiasing of Pre-trained Word Embeddings

Word embeddings trained on large corpora have shown to encode high level...
research
09/06/2019

To lemmatize or not to lemmatize: how word normalisation affects ELMo performance in word sense disambiguation

We critically evaluate the widespread assumption that deep learning NLP ...
research
04/29/2017

Extending and Improving Wordnet via Unsupervised Word Embeddings

This work presents an unsupervised approach for improving WordNet that b...
research
03/22/2018

Word sense induction using word embeddings and community detection in complex networks

Word Sense Induction (WSI) is the ability to automatically induce word s...
research
07/21/2017

Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation

Interpretability of a predictive model is a powerful feature that gains ...
research
09/17/2018

Unsupervised Sense-Aware Hypernymy Extraction

In this paper, we show how unsupervised sense representations can be use...

Please sign up or login with your details

Forgot password? Click here to reset