An Enhanced Latent Semantic Analysis Approach for Arabic Document Summarization

07/31/2018
by   Kamal Al-Sabahi, et al.
0

The fast-growing amount of information on the Internet makes the research in automatic document summarization very urgent. It is an effective solution for information overload. Many approaches have been proposed based on different strategies, such as latent semantic analysis (LSA). However, LSA, when applied to document summarization, has some limitations which diminish its performance. In this work, we try to overcome these limitations by applying statistic and linear algebraic approaches combined with syntactic and semantic processing of text. First, the part of speech tagger is utilized to reduce the dimension of LSA. Then, the weight of the term in four adjacent sentences is added to the weighting schemes while calculating the input matrix to take into account the word order and the syntactic relations. In addition, a new LSA-based sentence selection algorithm is proposed, in which the term description is combined with sentence description for each topic which in turn makes the generated summary more informative and diverse. To ensure the effectiveness of the proposed LSA-based sentence selection algorithm, extensive experiment on Arabic and English are done. Four datasets are used to evaluate the new model, Linguistic Data Consortium (LDC) Arabic Newswire-a corpus, Essex Arabic Summaries Corpus (EASC), DUC2002, and Multilingual MSS 2015 dataset. Experimental results on the four datasets show the effectiveness of the proposed model on Arabic and English datasets. It performs comprehensively better compared to the state-of-the-art methods.

READ FULL TEXT
research
07/08/2018

Latent Semantic Analysis Approach for Document Summarization Based on Word Embeddings

Since the amount of information on the internet is growing rapidly, it i...
research
04/20/2022

Towards Arabic Sentence Simplification via Classification and Generative Approaches

This paper presents an attempt to build a Modern Standard Arabic (MSA) s...
research
10/11/2012

Artex is AnotheR TEXt summarizer

This paper describes Artex, another algorithm for Automatic Text Summari...
research
12/25/2022

GAE-ISumm: Unsupervised Graph-Based Summarization of Indian Languages

Document summarization aims to create a precise and coherent summary of ...
research
02/07/2017

Effects of Stop Words Elimination for Arabic Information Retrieval: A Comparative Study

The effectiveness of three stop words lists for Arabic Information Retri...
research
05/24/2023

Dolphin: A Challenging and Diverse Benchmark for Arabic NLG

We present Dolphin, a novel benchmark that addresses the need for an eva...
research
02/27/2019

An Editorial Network for Enhanced Document Summarization

We suggest a new idea of Editorial Network - a mixed extractive-abstract...

Please sign up or login with your details

Forgot password? Click here to reset