On the Copying Problem of Unsupervised NMT: A Training Schedule with a Language Discriminator Loss

05/26/2023
by   Yihong Liu, et al.
0

Although unsupervised neural machine translation (UNMT) has achieved success in many language pairs, the copying problem, i.e., directly copying some parts of the input sentence as the translation, is common among distant language pairs, especially when low-resource languages are involved. We find this issue is closely related to an unexpected copying behavior during online back-translation (BT). In this work, we propose a simple but effective training schedule that incorporates a language discriminator loss. The loss imposes constraints on the intermediate translation so that the translation is in the desired language. By conducting extensive experiments on different language pairs, including similar and distant, high and low-resource languages, we find that our method alleviates the copying problem, thus improving the translation performance on low-resource languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2021

Unsupervised Machine Translation On Dravidian Languages

Unsupervised neural machine translation (UNMT) is beneficial especially ...
research
06/06/2019

Unsupervised Pivot Translation for Distant Languages

Unsupervised neural machine translation (NMT) has attracted a lot of att...
research
04/04/2019

Revisiting Adversarial Autoencoder for Unsupervised Word Translation with Cycle Consistency and Improved Training

Adversarial training has shown impressive success in learning bilingual ...
research
01/20/2023

Is ChatGPT A Good Translator? A Preliminary Study

This report provides a preliminary evaluation of ChatGPT for machine tra...
research
05/13/2018

Triangular Architecture for Rare Language Translation

Neural Machine Translation (NMT) performs poor on the low-resource langu...
research
01/10/2020

Learning to Multi-Task Learn for Better Neural Machine Translation

Scarcity of parallel sentence pairs is a major challenge for training hi...
research
10/05/2020

A Generalized Constraint Approach to Bilingual Dictionary Induction for Low-Resource Language Families

The lack or absence of parallel and comparable corpora makes bilingual l...

Please sign up or login with your details

Forgot password? Click here to reset