Investigating Entropy for Extractive Document Summarization

09/22/2021
by   Alka Khurana, et al.
0

Automatic text summarization aims to cut down readers time and cognitive effort by reducing the content of a text document without compromising on its essence. Ergo, informativeness is the prime attribute of document summary generated by an algorithm, and selecting sentences that capture the essence of a document is the primary goal of extractive document summarization. In this paper, we employ Shannon entropy to capture informativeness of sentences. We employ Non-negative Matrix Factorization (NMF) to reveal probability distributions for computing entropy of terms, topics, and sentences in latent space. We present an information theoretic interpretation of the computed entropy, which is the bedrock of the proposed E-Summ algorithm, an unsupervised method for extractive document summarization. The algorithm systematically applies information theoretic principle for selecting informative sentences from important topics in the document. The proposed algorithm is generic and fast, and hence amenable to use for summarization of documents in real time. Furthermore, it is domain-, collection-independent and agnostic to the language of the document. Benefiting from strictly positive NMF factor matrices, E-Summ algorithm is transparent and explainable too. We use standard ROUGE toolkit for performance evaluation of the proposed method on four well known public data-sets. We also perform quantitative assessment of E-Summ summary quality by computing its semantic similarity w.r.t the original document. Our investigation reveals that though using NMF and information theoretic approach for document summarization promises efficient, explainable, and language independent text summarization, it needs to be bolstered to match the performance of deep neural methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/25/2017

Revisiting the Centroid-based Method: A Strong Baseline for Multi-Document Summarization

The centroid-based model for extractive document summarization is a simp...
research
06/16/2023

I Want This, Not That: Personalized Summarization of Scientific Scholarly Texts

In this paper, we present a proposal for an unsupervised algorithm, P-Su...
research
04/11/2023

LBMT team at VLSP2022-Abmusu: Hybrid method with text correlation and generative models for Vietnamese multi-document summarization

Multi-document summarization is challenging because the summaries should...
research
10/30/2017

Conceptual Text Summarizer: A new model in continuous vector space

Traditional methods of summarization are not cost-effective and possible...
research
05/08/2012

Document summarization using positive pointwise mutual information

The degree of success in document summarization processes depends on the...
research
08/27/2020

MultiGBS: A multi-layer graph approach to biomedical summarization

Automatic text summarization methods generate a shorter version of the i...
research
08/05/2017

Extractive Multi Document Summarization using Dynamical Measurements of Complex Networks

Due to the large amount of textual information available on Internet, it...

Please sign up or login with your details

Forgot password? Click here to reset