Unsupervised Pidgin Text Generation By Pivoting English Data and Self-Training

03/18/2020
by   Ernie Chang, et al.
0

West African Pidgin English is a language that is significantly spoken in West Africa, consisting of at least 75 million speakers. Nevertheless, proper machine translation systems and relevant NLP datasets for pidgin English are virtually absent. In this work, we develop techniques targeted at bridging the gap between Pidgin English and English in the context of natural language generation. area of data-to-text generation. By building upon the previously released monolingual Pidgin English text and parallel English data-to-text corpus, we hope to build a system that can automatically generate Pidgin English descriptions from structured data. We first train a data-to-English text generation system, before employing techniques in unsupervised neural machine translation and self-training to establish the Pidgin-to-English cross-lingual alignment. The human evaluation performed on the generated Pidgin texts shows that, though still far from being practically usable, the pivoting + self-training technique improves both Pidgin text fluency and relevance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2019

PidginUNMT: Unsupervised Neural Machine Translation from West African Pidgin to English

Over 800 languages are spoken across West Africa. Despite the obvious di...
research
04/05/2020

Machine Translation Pre-training for Data-to-Text Generation – A Case Study in Czech

While there is a large body of research studying deep learning methods f...
research
04/20/2019

Unsupervised Text Generation from Structured Data

This work presents a joint solution to two challenging tasks: text gener...
research
09/20/2023

Prototype of a robotic system to assist the learning process of English language with text-generation through DNN

In the last ongoing years, there has been a significant ascending on the...
research
08/19/2023

Data-to-text Generation for Severely Under-Resourced Languages with GPT-3.5: A Bit of Help Needed from Google Translate

LLMs like GPT are great at tasks involving English which dominates in th...
research
12/15/2022

Multi-VALUE: A Framework for Cross-Dialectal English NLP

Dialect differences caused by regional, social, and economic barriers ca...
research
02/25/2019

Using logical form encodings for unsupervised linguistic transformation: Theory and applications

We present a novel method to architect automatic linguistic transformati...

Please sign up or login with your details

Forgot password? Click here to reset