Monolingual alignment of word senses and definitions in lexicographical resources

by   Sina Ahmadi, et al.

The focus of this thesis is broadly on the alignment of lexicographical data, particularly dictionaries. In order to tackle some of the challenges in this field, two main tasks of word sense alignment and translation inference are addressed. The first task aims to find an optimal alignment given the sense definitions of a headword in two different monolingual dictionaries. This is a challenging task, especially due to differences in sense granularity, coverage and description in two resources. After describing the characteristics of various lexical semantic resources, we introduce a benchmark containing 17 datasets of 15 languages where monolingual word senses and definitions are manually annotated across different resources by experts. In the creation of the benchmark, lexicographers' knowledge is incorporated through the annotations where a semantic relation, namely exact, narrower, broader, related or none, is selected for each sense pair. This benchmark can be used for evaluation purposes of word-sense alignment systems. The performance of a few alignment techniques based on textual and non-textual semantic similarity detection and semantic relation induction is evaluated using the benchmark. Finally, we extend this work to translation inference where translation pairs are induced to generate bilingual lexicons in an unsupervised way using various approaches based on graph analysis. This task is of particular interest for the creation of lexicographical resources for less-resourced and under-represented languages and also, assists in increasing coverage of the existing resources. From a practical point of view, the techniques and methods that are developed in this thesis are implemented within a tool that can facilitate the alignment task.


Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual Lexical Semantic Similarity

We introduce Multi-SimLex, a large-scale lexical resource and evaluation...

Unsupervised Alignment of Distributional Word Embeddings

Cross-domain alignment play a key roles in tasks ranging from machine tr...

Sense Vocabulary Compression through the Semantic Knowledge of WordNet for Neural Word Sense Disambiguation

In this article, we tackle the issue of the limited quantity of manually...

Look It Up: Bilingual and Monolingual Dictionaries Improve Neural Machine Translation

Despite advances in neural machine translation (NMT) quality, rare words...

Fake it Till You Make it: Self-Supervised Semantic Shifts for Monolingual Word Embedding Tasks

The use of language is subject to variation over time as well as across ...

Unbalanced Optimal Transport for Unbalanced Word Alignment

Monolingual word alignment is crucial to model semantic interactions bet...

Neural semi-Markov CRF for Monolingual Word Alignment

Monolingual word alignment is important for studying fine-grained editin...

Please sign up or login with your details

Forgot password? Click here to reset