Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles

09/17/2023
by   Kung-Hsiang Huang, et al.
0

Previous research in multi-document news summarization has typically concentrated on collating information that all sources agree upon. However, to our knowledge, the summarization of diverse information dispersed across multiple articles about an event has not been previously investigated. The latter imposes a different set of challenges for a summarization model. In this paper, we propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event. To facilitate this task, we outlined a data collection schema for identifying diverse information and curated a dataset named DiverseSumm. The dataset includes 245 news stories, with each story comprising 10 news articles and paired with a human-validated reference. Moreover, we conducted a comprehensive analysis to pinpoint the position and verbosity biases when utilizing Large Language Model (LLM)-based metrics for evaluating the coverage and faithfulness of the summaries, as well as their correlation with human assessments. We applied our findings to study how LLMs summarize multiple news articles by analyzing which type of diverse information LLMs are capable of identifying. Our analyses suggest that despite the extraordinary capabilities of LLMs in single-document summarization, the proposed task remains a complex challenge for them mainly due to their limited coverage, with GPT-4 only able to cover less than 40 information on average.

READ FULL TEXT

page 16

page 17

page 18

page 19

page 20

page 21

page 22

page 23

research
06/03/2015

Summarization of Films and Documentaries Based on Subtitles and Scripts

We assess the performance of generic text summarization algorithms appli...
research
01/26/2020

Generating Representative Headlines for News Stories

of news articles are published online every day, which can be overwhelm...
research
12/02/2022

SumREN: Summarizing Reported Speech about Events in News

A primary objective of news articles is to establish the factual record ...
research
04/27/2020

Screenplay Summarization Using Latent Narrative Structure

Most general-purpose extractive summarization models are trained on news...
research
11/03/2020

Topic-Centric Unsupervised Multi-Document Summarization of Scientific and News Articles

Recent advances in natural language processing have enabled automation o...
research
08/30/2019

Earlier Isn't Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization

Despite the recent developments on neural summarization systems, the und...
research
10/29/2018

Content Selection in Deep Learning Models of Summarization

We carry out experiments with deep learning models of summarization acro...

Please sign up or login with your details

Forgot password? Click here to reset