Unsupervised Translation of German–Lower Sorbian: Exploring Training and Novel Transfer Methods on a Low-Resource Language

09/24/2021
by   Lukas Edman, et al.
0

This paper describes the methods behind the systems submitted by the University of Groningen for the WMT 2021 Unsupervised Machine Translation task for German–Lower Sorbian (DE–DSB): a high-resource language to a low-resource one. Our system uses a transformer encoder-decoder architecture in which we make three changes to the standard training procedure. First, our training focuses on two languages at a time, contrasting with a wealth of research on multilingual systems. Second, we introduce a novel method for initializing the vocabulary of an unseen language, achieving improvements of 3.2 BLEU for DE→DSB and 4.0 BLEU for DSB→DE. Lastly, we experiment with the order in which offline and online back-translation are used to train an unsupervised system, finding that using online back-translation first works better for DE→DSB by 2.76 BLEU. Our submissions ranked first (tied with another team) for DSB→DE and third for DE→DSB.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2020

CUNI Systems for the Unsupervised and Very Low Resource Translation Task in WMT20

This paper presents a description of CUNI systems submitted to the WMT20...
research
10/25/2020

The LMU Munich System for the WMT 2020 Unsupervised Machine Translation Shared Task

This paper describes the submission of LMU Munich to the WMT 2020 unsupe...
research
09/23/2020

Harnessing Multilinguality in Unsupervised Machine Translation for Rare Languages

Unsupervised translation has reached impressive performance on resource-...
research
10/14/2019

Transformers without Tears: Improving the Normalization of Self-Attention

We evaluate three simple, normalization-centric changes to improve Trans...
research
08/07/2021

Improving Similar Language Translation With Transfer Learning

We investigate transfer learning based on pre-trained neural machine tra...
research
11/20/2021

Data Processing Matters: SRPH-Konvergen AI's Machine Translation System for WMT'21

In this paper, we describe the submission of the joint Samsung Research ...
research
12/31/2020

VOLT: Improving Vocabularization via Optimal Transport for Machine Translation

It is well accepted that the choice of token vocabulary largely affects ...

Please sign up or login with your details

Forgot password? Click here to reset