SwissDial: Parallel Multidialectal Corpus of Spoken Swiss German

03/21/2021
by   Pelin Dogan-Schönberger, et al.
0

Swiss German is a dialect continuum whose natively acquired dialects significantly differ from the formal variety of the language. These dialects are mostly used for verbal communication and do not have standard orthography. This has led to a lack of annotated datasets, rendering the use of many NLP methods infeasible. In this paper, we introduce the first annotated parallel corpus of spoken Swiss German across 8 major dialects, plus a Standard German reference. Our goal has been to create and to make available a basic dataset for employing data-driven NLP applications in Swiss German. We present our data collection procedure in detail and validate the quality of our corpus by conducting experiments with the recent neural models for speech synthesis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2022

German Parliamentary Corpus (GerParCor)

Parliamentary debates represent a large and partly unexploited treasure ...
research
10/06/2020

Swiss Parliaments Corpus, an Automatically Aligned Swiss German Speech to Standard German Text Corpus

We present a forced sentence alignment procedure for Swiss German speech...
research
05/30/2023

STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions

We present STT4SG-350 (Speech-to-Text for Swiss German), a corpus of Swi...
research
05/19/2022

Curras + Baladi: Towards a Levantine Corpus

The processing of the Arabic language is a complex field of research. Th...
research
05/02/2022

TuGeBiC: A Turkish German Bilingual Code-Switching Corpus

In this paper we describe the process of collection, transcription, and ...
research
05/27/2022

Who is we? Disambiguating the referents of first person plural pronouns in parliamentary debates

This paper investigates the use of first person plural pronouns as a rhe...

Please sign up or login with your details

Forgot password? Click here to reset