Tone prediction and orthographic conversion for Basaa

10/13/2022
by   Ilya Nikitin, et al.
0

In this paper, we present a seq2seq approach for transliterating missionary Basaa orthographies into the official orthography. Our model uses pre-trained Basaa missionary and official orthography corpora using BERT. Since Basaa is a low-resource language, we have decided to use the mT5 model for our project. Before training our model, we pre-processed our corpora by eliminating one-to-one correspondences between spellings and unifying characters variably containing either one to two characters into single-character form. Our best mT5 model achieved a CER equal to 12.6747 and a WER equal to 40.1012.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/26/2022

Neural Grapheme-to-Phoneme Conversion with Pre-trained Grapheme Models

Neural network models have achieved state-of-the-art performance on grap...
research
07/13/2022

Exploiting Word Semantics to Enrich Character Representations of Chinese Pre-trained Models

Most of the Chinese pre-trained models adopt characters as basic units f...
research
08/10/2020

KR-BERT: A Small-Scale Korean-Specific Language Model

Since the appearance of BERT, recent works including XLNet and RoBERTa u...
research
03/06/2022

Capsule Networks for Character Recognition in Low Resource Languages

Most of the existing techniques in handwritten character recognition are...
research
10/24/2020

Char2Subword: Extending the Subword Embedding Space from Pre-trained Models Using Robust Character Compositionality

Byte-pair encoding (BPE) is a ubiquitous algorithm in the subword tokeni...
research
10/31/2020

Free the Plural: Unrestricted Split-Antecedent Anaphora Resolution

Now that the performance of coreference resolvers on the simpler forms o...

Please sign up or login with your details

Forgot password? Click here to reset