Entity-Assisted Language Models for Identifying Check-worthy Sentences

by   Ting Su, et al.

We propose a new uniform framework for text classification and ranking that can automate the process of identifying check-worthy sentences in political debates and speech transcripts. Our framework combines the semantic analysis of the sentences, with additional entity embeddings obtained through the identified entities within the sentences. In particular, we analyse the semantic meaning of each sentence using state-of-the-art neural language models such as BERT, ALBERT, and RoBERTa, while embeddings for entities are obtained from knowledge graph (KG) embedding models. Specifically, we instantiate our framework using five different language models, entity embeddings obtained from six different KG embedding models, as well as two combination methods leading to several Entity-Assisted neural language models. We extensively evaluate the effectiveness of our framework using two publicly available datasets from the CLEF' 2019 2020 CheckThat! Labs. Our results show that the neural language models significantly outperform traditional TF.IDF and LSTM methods. In addition, we show that the ALBERT model is consistently the most effective model among all the tested neural language models. Our entity embeddings significantly outperform other existing approaches from the literature that are based on similarity and relatedness scores between the entities in a sentence, when used alongside a KG embedding.


page 1

page 2

page 3

page 4


DBLPLink: An Entity Linker for the DBLP Scholarly Knowledge Graph

In this work, we present a web application named DBLPLink, which perform...

Learning to Borrow – Relation Representation for Without-Mention Entity-Pairs for Knowledge Graph Completion

Prior work on integrating text corpora with knowledge graphs (KGs) to im...

Representations Matter: Embedding Modes of Large Language Models using Dynamic Mode Decomposition

Existing large language models (LLMs) are known for generating "hallucin...

Entity Cloze By Date: What LMs Know About Unseen Entities

Language models (LMs) are typically trained once on a large-scale corpus...

Reweighting Strategy based on Synthetic Data Identification for Sentence Similarity

Semantically meaningful sentence embeddings are important for numerous t...

A RelEntLess Benchmark for Modelling Graded Relations between Named Entities

Relations such as "is influenced by", "is known for" or "is a competitor...

Modelling Monotonic and Non-Monotonic Attribute Dependencies with Embeddings: A Theoretical Analysis

During the last decade, entity embeddings have become ubiquitous in Arti...

Please sign up or login with your details

Forgot password? Click here to reset