ED-FAITH: Evaluating Dialogue Summarization on Faithfulness

11/15/2022
by   Sicong Huang, et al.
0

Abstractive summarization models typically generate content unfaithful to the input, thus highlighting the significance of evaluating the faithfulness of generated summaries. Most faithfulness metrics are only evaluated on news domain, can they be transferred to other summarization tasks? In this work, we first present a systematic study of faithfulness metrics for dialogue summarization. We evaluate common faithfulness metrics on dialogue datasets and observe that most metrics correlate poorly with human judgements despite performing well on news datasets. Given these findings, to improve existing metrics' performance on dialogue summarization, we first finetune on in-domain dataset, then apply unlikelihood training on negative samples, and show that they can successfully improve metric performance on dialogue data. Inspired by the strong zero-shot performance of the T0 language model, we further propose T0-Score – a new metric for faithfulness evaluation, which shows consistent improvement against baseline metrics across multiple domains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2022

Leveraging Non-dialogue Summaries for Dialogue Summarization

To mitigate the lack of diverse dialogue summarization datasets in acade...
research
09/16/2021

A Bag of Tricks for Dialogue Summarization

Dialogue summarization comes with its own peculiar challenges as opposed...
research
06/03/2022

Relevance in Dialogue: Is Less More? An Empirical Comparison of Existing Metrics, and a Novel Simple Metric

In this work, we evaluate various existing dialogue relevance metrics, f...
research
05/10/2023

Generating medically-accurate summaries of patient-provider dialogue: A multi-stage approach using large language models

A medical provider's summary of a patient visit serves several critical ...
research
10/18/2022

Taxonomy of Abstractive Dialogue Summarization: Scenarios, Approaches and Future Directions

Abstractive dialogue summarization is to generate a concise and fluent s...
research
11/29/2022

Zero-Shot Opinion Summarization with GPT-3

Very large language models such as GPT-3 have shown impressive performan...
research
06/21/2021

How well do you know your summarization datasets?

State-of-the-art summarization systems are trained and evaluated on mass...

Please sign up or login with your details

Forgot password? Click here to reset