English-Twi Parallel Corpus for Machine Translation

03/29/2021
by   Paul Azunre, et al.
3

We present a parallel machine translation training corpus for English and Akuapem Twi of 25,421 sentence pairs. We used a transformer-based translator to generate initial translations in Akuapem Twi, which were later verified and corrected where necessary by native speakers to eliminate any occurrence of translationese. In addition, 697 higher quality crowd-sourced sentences are provided for use as an evaluation set for downstream Natural Language Processing (NLP) tasks. The typical use case for the larger human-verified dataset is for further training of machine translation models in Akuapem Twi. The higher quality 697 crowd-sourced dataset is recommended as a testing dataset for machine translation of English to Twi and Twi to English models. Furthermore, the Twi part of the crowd-sourced data may also be used for other tasks, such as representation learning, classification, etc. We fine-tune the transformer translation model on the training corpus and report benchmarks on the crowd-sourced test set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/07/2018

MIZAN: A Large Persian-English Parallel Corpus

One of the most major and essential tasks in natural language processing...
research
09/28/2022

Effective General-Domain Data Inclusion for the Machine Translation Task by Vanilla Transformers

One of the vital breakthroughs in the history of machine translation is ...
research
10/26/2020

Data Troubles in Sentence Level Confidence Estimation for Machine Translation

The paper investigates the feasibility of confidence estimation for neur...
research
05/02/2022

Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation

Multi-modal Machine Translation (MMT) enables the use of visual informat...
research
08/16/2023

Fast Training of NMT Model with Data Sorting

The Transformer model has revolutionized Natural Language Processing tas...
research
06/27/2021

Power Law Graph Transformer for Machine Translation and Representation Learning

We present the Power Law Graph Transformer, a transformer model with wel...
research
04/26/2022

Disambiguation of morpho-syntactic features of African American English – the case of habitual be

Recent research has highlighted that natural language processing (NLP) s...

Please sign up or login with your details

Forgot password? Click here to reset