Low-Resource Machine Translation for Low-Resource Languages: Leveraging Comparable Data, Code-Switching and Compute Resources

03/24/2021
by   Garry Kuwanto, et al.
19

We conduct an empirical study of unsupervised neural machine translation (NMT) for truly low resource languages, exploring the case when both parallel training data and compute resource are lacking, reflecting the reality of most of the world's languages and the researchers working on these languages. We propose a simple and scalable method to improve unsupervised NMT, showing how adding comparable data mined using a bilingual dictionary along with modest additional compute resource to train the model can significantly improve its performance. We also demonstrate how the use of the dictionary to code-switch monolingual data to create more comparable data can further improve performance. With this weak supervision, our best method achieves BLEU scores that improve over supervised results for English→Gujarati (+18.88), English→Kazakh (+5.84), and English→Somali (+1.16), showing the promise of weakly-supervised NMT for many low resource languages with modest compute resource in the world. To the best of our knowledge, our work is the first to quantitatively showcase the impact of different modest compute resource in low resource NMT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2021

Low-Resource Machine Translation Training Curriculum Fit for Low-Resource Languages

We conduct an empirical study of neural machine translation (NMT) for tr...
research
08/30/2019

Handling Syntactic Divergence in Low-resource Machine Translation

Despite impressive empirical successes of neural machine translation (NM...
research
04/09/2020

On optimal transformer depth for low-resource language translation

Transformers have shown great promise as an approach to Neural Machine T...
research
11/07/2019

Low-Resource Machine Translation using Interlinear Glosses

Neural Machine Translation (NMT) does not handle low-resource translatio...
research
11/05/2019

Data Diversification: An Elegant Strategy For Neural Machine Translation

A common approach to improve neural machine translation is to invent new...
research
10/06/2021

The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation

A "bigger is better" explosion in the number of parameters in deep neura...
research
11/14/2020

Iterative Self-Learning for Enhanced Back-Translation in Low Resource Neural Machine Translation

Many language pairs are low resource - the amount and/or quality of para...

Please sign up or login with your details

Forgot password? Click here to reset