SummEval: Re-evaluating Summarization Evaluation

07/24/2020
by   Alexander R. Fabbri, et al.
0

The scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack of consensus regarding evaluation protocols continues to inhibit progress. We address the existing shortcomings of summarization evaluation methods along five dimensions: 1) we re-evaluate 12 automatic evaluation metrics in a comprehensive and consistent fashion using neural summarization model outputs along with expert and crowd-sourced human annotations, 2) we consistently benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics, 3) we assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset and share it in a unified format, 4) we implement and share a toolkit that provides an extensible and unified API for evaluating summarization models across a broad range of automatic metrics, 5) we assemble and share the largest and most diverse, in terms of model types, collection of human judgments of model-generated summaries on the CNN/Daily Mail dataset annotated by both expert judges and crowd source workers. We hope that this work will help promote a more complete evaluation protocol for text summarization as well as advance research in developing evaluation metrics that better correlate with human judgements.

READ FULL TEXT
research
03/18/2023

Revisiting Automatic Question Summarization Evaluation in the Biomedical Domain

Automatic evaluation metrics have been facilitating the rapid developmen...
research
07/10/2020

SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics

We present SacreROUGE, an open-source library for using and developing s...
research
12/15/2022

Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation

Human evaluation is the foundation upon which the evaluation of both sum...
research
05/20/2020

Examining the State-of-the-Art in News Timeline Summarization

Previous work on automatic news timeline summarization (TLS) leaves an u...
research
04/08/2020

Asking and Answering Questions to Evaluate the Factual Consistency of Summaries

Practical applications of abstractive summarization models are limited b...
research
04/15/2021

SummVis: Interactive Visual Analysis of Models, Data, and Evaluation for Text Summarization

Novel neural architectures, training strategies, and the availability of...
research
08/23/2019

Neural Text Summarization: A Critical Evaluation

Text summarization aims at compressing long documents into a shorter for...

Please sign up or login with your details

Forgot password? Click here to reset