Neural Machine Translation Data Generation and Augmentation using ChatGPT

07/11/2023
by   Wayne Yang, et al.
0

Neural models have revolutionized the field of machine translation, but creating parallel corpora is expensive and time-consuming. We investigate an alternative to manual parallel corpora - hallucinated parallel corpora created by generative language models. Although these models are themselves trained on parallel data, they can leverage a multilingual vector space to create data, and may be able to supplement small manually-procured corpora. Our experiments highlight two key findings - despite a lack of diversity in their output, the hallucinated data improves the translation signal, even when the domain clashes with the original dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2023

Data Augmentation for Neural Machine Translation using Generative Language Model

Despite the rapid growth in model architecture, the scarcity of large pa...
research
10/19/2018

Impact of Corpora Quality on Neural Machine Translation

Large parallel corpora that are automatically obtained from the web, doc...
research
05/28/2018

A Stochastic Decoder for Neural Machine Translation

The process of translation is ambiguous, in that there are typically man...
research
10/11/2021

Using Document Similarity Methods to create Parallel Datasets for Code Translation

Translating source code from one programming language to another is a cr...
research
09/13/2021

Graph Algorithms for Multiparallel Word Alignment

With the advent of end-to-end deep learning approaches in machine transl...
research
05/14/2019

A Survey of Multilingual Neural Machine Translation

We present a survey on multilingual neural machine translation (MNMT), w...
research
07/24/2021

MDQE: A More Accurate Direct Pretraining for Machine Translation Quality Estimation

It is expensive to evaluate the results of Machine Translation(MT), whic...

Please sign up or login with your details

Forgot password? Click here to reset