BLEURT Has Universal Translations: An Analysis of Automatic Metrics by Minimum Risk Training

07/06/2023
by   Yiming Yan, et al.
0

Automatic metrics play a crucial role in machine translation. Despite the widespread use of n-gram-based metrics, there has been a recent surge in the development of pre-trained model-based metrics that focus on measuring sentence semantics. However, these neural metrics, while achieving higher correlations with human evaluations, are often considered to be black boxes with potential biases that are difficult to detect. In this study, we systematically analyze and compare various mainstream and cutting-edge automatic metrics from the perspective of their guidance for training machine translation systems. Through Minimum Risk Training (MRT), we find that certain metrics exhibit robustness defects, such as the presence of universal adversarial translations in BLEURT and BARTScore. In-depth analysis suggests two main causes of these robustness deficits: distribution biases in the training datasets, and the tendency of the metric paradigm. By incorporating token-level constraints, we enhance the robustness of evaluation metrics, which in turn leads to an improvement in the performance of machine translation systems. Codes are available at <https://github.com/powerpuffpomelo/fairseq_mrt>.

READ FULL TEXT

page 6

page 7

research
11/17/2021

Minimum Bayes Risk Decoding with Neural Metrics of Translation Quality

This work applies Minimum Bayes Risk (MBR) decoding to optimize diverse ...
research
06/11/2020

Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics

Automatic metrics are fundamental for the development and evaluation of ...
research
08/25/2023

Training and Meta-Evaluating Machine Translation Evaluation Metrics at the Paragraph Level

As research on machine translation moves to translating text beyond the ...
research
02/10/2022

Identifying Weaknesses in Machine Translation Metrics Through Minimum Bayes Risk Decoding: A Case Study for COMET

Neural metrics have achieved impressive correlation with human judgement...
research
05/19/2023

The Inside Story: Towards Better Understanding of Machine Translation Neural Evaluation Metrics

Neural metrics for machine translation evaluation, such as COMET, exhibi...
research
08/10/2015

Improve the Evaluation of Fluency Using Entropy for Machine Translation Evaluation Metrics

The widely-used automatic evaluation metrics cannot adequately reflect t...
research
12/20/2022

BMX: Boosting Machine Translation Metrics with Explainability

State-of-the-art machine translation evaluation metrics are based on bla...

Please sign up or login with your details

Forgot password? Click here to reset