Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts

Continuously-growing data volumes lead to larger generic models. Specific use-cases are usually left out, since generic models tend to perform poorly in domain-specific cases. Our work addresses this gap with a method for selecting in-domain data from generic-domain (parallel text) corpora, for the task of machine translation. The proposed method ranks sentences in parallel general-domain data according to their cosine similarity with a monolingual domain-specific data set. We then select the top K sentences with the highest similarity score to train a new machine translation system tuned to the specific in-domain data. Our experimental results show that models trained on this in-domain data outperform models trained on generic or a mixture of generic and domain data. That is, our method selects high-quality domain-specific training instances at low computational cost and data size.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2018

A Survey of Domain Adaptation for Neural Machine Translation

Neural machine translation (NMT) is a deep learning based approach for m...
research
11/08/2018

The Generic SysML/KAOS Domain Metamodel

This paper is related to the generalised/generic version of the SysML/KA...
research
08/27/2019

Unsupervised Domain Adaptation for Neural Machine Translation with Domain-Aware Feature Embeddings

The recent success of neural machine translation models relies on the av...
research
10/01/2019

Essentia: Mining Domain-specific Paraphrases with Word-Alignment Graphs

Paraphrases are important linguistic resources for a wide variety of NLP...
research
10/06/2020

Iterative Domain-Repaired Back-Translation

In this paper, we focus on the domain-specific translation with low reso...
research
10/18/2022

Domain Specific Sub-network for Multi-Domain Neural Machine Translation

This paper presents Domain-Specific Sub-network (DoSS). It uses a set of...

Please sign up or login with your details

Forgot password? Click here to reset