Identifying missing dictionary entries with frequency-conserving context models

03/07/2015
by   Jake Ryland Williams, et al.
0

In an effort to better understand meaning from natural language texts, we explore methods aimed at organizing lexical objects into contexts. A number of these methods for organization fall into a family defined by word ordering. Unlike demographic or spatial partitions of data, these collocation models are of special importance for their universal applicability. While we are interested here in text and have framed our treatment appropriately, our work is potentially applicable to other areas of research (e.g., speech, genomics, and mobility patterns) where one has ordered categorical data, (e.g., sounds, genes, and locations). Our approach focuses on the phrase (whether word or larger) as the primary meaning-bearing lexical unit and object of study. To do so, we employ our previously developed framework for generating word-conserving phrase-frequency data. Upon training our model with the Wiktionary---an extensive, online, collaborative, and open-source dictionary that contains over 100,000 phrasal-definitions---we develop highly effective filters for the identification of meaningful, missing phrase-entries. With our predictions we then engage the editorial community of the Wiktionary and propose short lists of potential missing entries for definition, developing a breakthrough, lexical extraction technique, and expanding our knowledge of the defined English lexicon of phrases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2015

Learning to Understand Phrases by Embedding the Dictionary

Distributional models that learn rich semantic word representations are ...
research
04/23/2022

LitMind Dictionary: An Open-Source Online Dictionary

Dictionaries can help language learners to learn vocabulary by providing...
research
11/01/2018

Learning to Describe Phrases with Local and Global Contexts

When reading a text, it is common to become stuck on unfamiliar words an...
research
12/01/2015

Augmenting Phrase Table by Employing Lexicons for Pivot-based SMT

Pivot language is employed as a way to solve the data sparseness problem...
research
02/27/2019

Still a Pain in the Neck: Evaluating Text Representations on Lexical Composition

Building meaningful phrase representations is challenging because phrase...
research
06/22/2022

Connecting a French Dictionary from the Beginning of the 20th Century to Wikidata

The Petit Larousse illustré is a French dictionary first published in 19...
research
11/02/2020

Sequence-to-Sequence Networks Learn the Meaning of Reflexive Anaphora

Reflexive anaphora present a challenge for semantic interpretation: thei...

Please sign up or login with your details

Forgot password? Click here to reset