CODET: A Benchmark for Contrastive Dialectal Evaluation of Machine Translation

05/26/2023
by   Md Mahfuz ibn Alam, et al.
0

Neural machine translation (NMT) systems exhibit limited robustness in handling source-side linguistic variations. Their performance tends to degrade when faced with even slight deviations in language usage, such as different domains or variations introduced by second-language speakers. It is intuitive to extend this observation to encompass dialectal variations as well, but the work allowing the community to evaluate MT systems on this dimension is limited. To alleviate this issue, we compile and release , a contrastive dialectal benchmark encompassing 882 different variations from nine different languages. We also quantitatively demonstrate the challenges large MT models face in effectively translating dialectal variants. We are releasing all code and data.

READ FULL TEXT

page 7

page 8

page 11

research
04/13/2020

Neural Machine Translation: Challenges, Progress and Future

Machine translation (MT) is a technique that leverages computers to tran...
research
07/02/2019

Improving Robustness in Real-World Neural Machine Translation Engines

As a commercial provider of machine translation, we are constantly train...
research
12/14/2016

How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs

Analysing translation quality in regards to specific linguistic phenomen...
research
05/09/2022

CoCoA-MT: A Dataset and Benchmark for Contrastive Controlled MT with Application to Formality

The machine translation (MT) task is typically formulated as that of ret...
research
06/22/2021

On the Evaluation of Machine Translation for Terminology Consistency

As neural machine translation (NMT) systems become an important part of ...
research
06/08/2018

Findings of the Second Workshop on Neural Machine Translation and Generation

This document describes the findings of the Second Workshop on Neural Ma...
research
05/25/2022

Machine Translation Robustness to Natural Asemantic Variation

We introduce and formalize an under-studied linguistic phenomenon we cal...

Please sign up or login with your details

Forgot password? Click here to reset