A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems

11/08/2020
by   Craig Thomson, et al.
0

Most Natural Language Generation systems need to produce accurate texts. We propose a methodology for high-quality human evaluation of the accuracy of generated texts, which is intended to serve as a gold-standard for accuracy evaluations of data-to-text systems. We use our methodology to evaluate the accuracy of computer generated basketball summaries. We then show how our gold standard evaluation can be used to validate automated metrics

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2021

All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text

Human evaluations are typically considered the gold standard in natural ...
research
12/15/2022

TeTIm-Eval: a novel curated evaluation data set for comparing text-to-image models

Evaluating and comparing text-to-image models is a challenging problem. ...
research
07/28/2021

Investigating Text Simplification Evaluation

Modern text simplification (TS) heavily relies on the availability of go...
research
04/09/2021

Towards objectively evaluating the quality of generated medical summaries

We propose a method for evaluating the quality of generated text by aski...
research
06/22/2020

Shared Task on Evaluating Accuracy in Natural Language Generation

We propose a shared task on methodologies and algorithms for evaluating ...
research
05/24/2022

How Human is Human Evaluation? Improving the Gold Standard for NLG with Utility Theory

Human ratings are treated as the gold standard in NLG evaluation. The st...
research
04/17/2019

Automatic Accuracy Prediction for AMR Parsing

Abstract Meaning Representation (AMR) represents sentences as directed, ...

Please sign up or login with your details

Forgot password? Click here to reset