Document-aligned Japanese-English Conversation Parallel Corpus

12/11/2020
by   Matīss Rikters, et al.
0

Sentence-level (SL) machine translation (MT) has reached acceptable quality for many high-resourced languages, but not document-level (DL) MT, which is difficult to 1) train with little amount of DL data; and 2) evaluate, as the main methods and data sets focus on SL evaluation. To address the first issue, we present a document-aligned Japanese-English conversation corpus, including balanced, high-quality business conversation data for tuning and testing. As for the second issue, we manually identify the main areas where SL MT fails to produce adequate translations in lack of context. We then create an evaluation set where these phenomena are annotated to alleviate automatic evaluation of DL systems. We train MT models using our corpus to demonstrate how using context leads to improvements.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/26/2022

A Bilingual Parallel Corpus with Discourse Annotations

Machine translation (MT) has almost achieved human parity at sentence-le...
research
05/30/2019

DiaBLa: A Corpus of Bilingual Spontaneous Written Dialogues for Machine Translation

We present a new English-French test set for the evaluation of Machine T...
research
05/15/2019

When a Good Translation is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion

Though machine translation errors caused by the lack of context beyond o...
research
05/18/2023

Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus

Several recent papers claim human parity at sentence-level Machine Trans...
research
09/13/2021

Evaluating Multiway Multilingual NMT in the Turkic Languages

Despite the increasing number of large and comprehensive machine transla...
research
10/03/2016

An Arabic-Hebrew parallel corpus of TED talks

We describe an Arabic-Hebrew parallel corpus of TED talks built upon WIT...
research
03/14/2023

Sensitive Region-based Metamorphic Testing Framework using Explainable AI

Deep Learning (DL) is one of the most popular research topics in machine...

Please sign up or login with your details

Forgot password? Click here to reset