SQuALITY: Building a Long-Document Summarization Dataset the Hard Way

05/23/2022
by   Alex Wang, et al.
0

Summarization datasets are often assembled either by scraping naturally occurring public-domain summaries – which are nearly always in difficult-to-work-with technical domains – or by using approximate heuristics to extract them from everyday text – which frequently yields unfaithful summaries. In this work, we turn to a slower but more straightforward approach to developing summarization benchmark data: We hire highly-qualified contractors to read stories and write original summaries from scratch. To amortize reading time, we collect five summaries per document, with the first giving an overview and the subsequent four addressing specific questions. We use this protocol to collect SQuALITY, a dataset of question-focused summaries built on the same public-domain short stories as the multiple-choice dataset QuALITY (Pang et al., 2021). Experiments with state-of-the-art summarization systems show that our dataset is challenging and that existing automatic evaluation metrics are weak indicators of quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2021

Bringing Structure into Summaries: a Faceted Summarization Dataset for Long Scientific Documents

Faceted summarization provides briefings of a document from different pe...
research
04/15/2022

Summarization with Graphical Elements

Automatic text summarization has experienced substantial progress in rec...
research
05/06/2021

Hone as You Read: A Practical Type of Interactive Summarization

We present HARE, a new task where reader feedback is used to optimize do...
research
01/31/2021

Contextualized Rewriting for Text Summarization

Extractive summarization suffers from irrelevance, redundancy and incohe...
research
07/20/2018

Abstractive and Extractive Text Summarization using Document Context Vector and Recurrent Neural Networks

Sequence to sequence (Seq2Seq) learning has recently been used for abstr...
research
04/26/2023

ChartSumm: A Comprehensive Benchmark for Automatic Chart Summarization of Long and Short Summaries

Automatic chart to text summarization is an effective tool for the visua...
research
09/16/2021

RetrievalSum: A Retrieval Enhanced Framework for Abstractive Summarization

Existing summarization systems mostly generate summaries purely relying ...

Please sign up or login with your details

Forgot password? Click here to reset