Speakers Fill Lexical Semantic Gaps with Context

10/05/2020
by   Tiago Pimentel, et al.
0

Lexical ambiguity is widespread in language, allowing for the reuse of economical word forms and therefore making language more efficient. If ambiguous words cannot be disambiguated from context, however, this gain in efficiency might make language less clear—resulting in frequent miscommunication. For a language to be clear and efficiently encoded, we posit that the lexical ambiguity of a word type should correlate with how much information context provides about it, on average. To investigate whether this is the case, we operationalise the lexical ambiguity of a word as the entropy of meanings it can take, and provide two ways to estimate this—one which requires human annotation (using WordNet), and one which does not (using BERT), making it readily applicable to a large number of languages. We validate these measures by showing that, on six high-resource languages, there are significant Pearson correlations between our BERT-based estimate of ambiguity and the number of synonyms a word has in WordNet (e.g. ρ = 0.40 in English). We then test our main hypothesis—that a word's lexical ambiguity should negatively correlate with its contextual uncertainty—and find significant correlations on all 18 typologically diverse languages we analyse. This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.

READ FULL TEXT
research
06/12/2019

Putting words in context: LSTM language models and lexical ambiguity

In neural network models of language, words are commonly represented usi...
research
09/27/2021

Patterns of Lexical Ambiguity in Contextualised Language Models

One of the central aspects of contextualised language models is that the...
research
05/27/2021

RAW-C: Relatedness of Ambiguous Words–in Context (A New Lexical Resource for English)

Most words are ambiguous–i.e., they convey distinct meanings in differen...
research
06/10/2018

Unsupervised Disambiguation of Syncretism in Inflected Lexicons

Lexical ambiguity makes it difficult to compute various useful statistic...
research
10/06/2020

Exploring BERT's Sensitivity to Lexical Cues using Tests from Semantic Priming

Models trained to estimate word probabilities in context have become ubi...
research
02/03/2021

Disambiguatory Signals are Stronger in Word-initial Positions

Psycholinguistic studies of human word processing and lexical access pro...
research
06/05/2022

Stylistic Fingerprints, POS-tags and Inflected Languages: A Case Study in Polish

In stylometric investigations, frequencies of the most frequent words (M...

Please sign up or login with your details

Forgot password? Click here to reset