Boosting Entity Linking Performance by Leveraging Unlabeled Documents

06/04/2019
by   Phong Le, et al.
3

Modern entity linking systems rely on large collections of documents specifically annotated for the task (e.g., AIDA CoNLL). In contrast, we propose an approach which exploits only naturally occurring information: unlabeled documents and Wikipedia. Our approach consists of two stages. First, we construct a high recall list of candidate entities for each mention in an unlabeled document. Second, we use the candidate lists as weak supervision to constrain our document-level entity linking model. The model treats entities as latent variables and, when estimated on a collection of unlabelled texts, learns to choose entities relying both on local context of each mention and on coherence with other entities in the document. The resulting approach rivals fully-supervised state-of-the-art systems on standard test sets. It also approaches their performance in the very challenging setting: when tested on a test set sampled from the data used to estimate the supervised systems. By comparing to Wikipedia-only training of our model, we demonstrate that modeling unlabeled documents is beneficial.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2022

Improving Candidate Retrieval with Entity Profile Generation for Wikidata Entity Linking

Entity linking (EL) is the task of linking entity mentions in a document...
research
09/12/2019

Fine-Grained Entity Typing for Domain Independent Entity Linking

Neural entity linking models are very powerful, but run the risk of over...
research
09/02/2021

Towards Explaining STEM Document Classification using Mathematical Entity Linking

Document subject classification is essential for structuring (digital) l...
research
04/10/2018

SWAT: A System for Detecting Salient Wikipedia Entities in Texts

We study the problem of entity salience by proposing the design and impl...
research
05/29/2022

Anchor Prediction: A Topic Modeling Approach

Networks of documents connected by hyperlinks, such as Wikipedia, are ub...
research
11/30/2017

Graph Centrality Measures for Boosting Popularity-Based Entity Linking

Many Entity Linking systems use collective graph-based methods to disamb...
research
12/14/2021

Text Classification Models for Form Entity Linking

Forms are a widespread type of template-based document used in a great v...

Please sign up or login with your details

Forgot password? Click here to reset