A Swiss German Dictionary: Variation in Speech and Writing

03/31/2020
by   Larissa Schmidt, et al.
0

We introduce a dictionary containing forms of common words in various Swiss German dialects normalized into High German. As Swiss German is, for now, a predominantly spoken language, there is a significant variation in the written forms, even between speakers of the same dialect. To alleviate the uncertainty associated with this diversity, we complement the pairs of Swiss German - High German words with the Swiss German phonetic transcriptions (SAMPA). This dictionary becomes thus the first resource to combine large-scale spontaneous translation with phonetic transcriptions. Moreover, we control for the regional distribution and insure the equal representation of the major Swiss dialects. The coupling of the phonetic and written Swiss German forms is powerful. We show that they are sufficient to train a Transformer-based phoneme to grapheme model that generates credible novel Swiss German writings. In addition, we show that the inverse mapping - from graphemes to phonemes - can be modeled with a transformer trained with the novel dictionary. This generation of pronunciations for previously unknown words is key in training extensible automated speech recognition (ASR) systems, which are key beneficiaries of this dictionary.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2021

Dialectal Speech Recognition and Translation of Swiss German Speech to Standard German Text: Microsoft's Submission to SwissText 2021

This paper describes the winning approach in the Shared Task 3 at SwissT...
research
06/30/2021

Genre determining prediction: Non-standard TAM marking in football language

German and French football language display tense-aspect-mood (TAM) form...
research
06/15/2021

Modeling morphology with Linear Discriminative Learning: considerations and design choices

This study addresses a series of methodological questions that arise whe...
research
03/02/2018

DEMorphy, German Language Morphological Analyzer

DEMorphy is a morphological analyzer for German. It is built onto large,...
research
05/26/2021

Multitask Learning for Grapheme-to-Phoneme Conversion of Anglicisms in German Speech Recognition

Loanwords, such as Anglicisms, are a challenge in German speech recognit...
research
04/08/2021

Grapheme-to-Phoneme Transformer Model for Transfer Learning Dialects

Grapheme-to-Phoneme (G2P) models convert words to their phonetic pronunc...

Please sign up or login with your details

Forgot password? Click here to reset