Improving Text Generation Evaluation with Batch Centering and Tempered Word Mover Distance

10/13/2020
by   Xi Chen, et al.
0

Recent advances in automatic evaluation metrics for text have shown that deep contextualized word representations, such as those generated by BERT encoders, are helpful for designing metrics that correlate well with human judgements. At the same time, it has been argued that contextualized word representations exhibit sub-optimal statistical properties for encoding the true similarity between words or sentences. In this paper, we present two techniques for improving encoding representations for similarity metrics: a batch-mean centering strategy that improves statistical properties; and a computationally efficient tempered Word Mover Distance, for better fusion of the information in the contextualized word representations. We conduct numerical experiments that demonstrate the robustness of our techniques, reporting results over various BERT-backbone learned metrics and achieving state of the art correlation with human ratings on several benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2022

Layer or Representation Space: What makes BERT-based Evaluation Metrics Robust?

The evaluation of recent embedding-based evaluation metrics for text gen...
research
04/21/2019

BERTScore: Evaluating Text Generation with BERT

We propose BERTScore, an automatic evaluation metric for text generation...
research
04/09/2020

BLEURT: Learning Robust Metrics for Text Generation

Text generation has made significant advances in the last few years. Yet...
research
01/26/2022

DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence

Recently, there has been a growing interest in designing text generation...
research
10/08/2021

Global Explainability of BERT-Based Evaluation Metrics by Disentangling along Linguistic Factors

Evaluation metrics are a key ingredient for progress of text generation ...
research
06/24/2022

Using BERT Embeddings to Model Word Importance in Conversational Transcripts for Deaf and Hard of Hearing Users

Deaf and hard of hearing individuals regularly rely on captioning while ...
research
04/13/2021

Semantic maps and metrics for science Semantic maps and metrics for science using deep transformer encoders

The growing deluge of scientific publications demands text analysis tool...

Please sign up or login with your details

Forgot password? Click here to reset