SDS-200: A Swiss German Speech to Standard German Text Corpus

05/19/2022
by   Michel Plüss, et al.
0

We present SDS-200, a corpus of Swiss German dialectal speech with Standard German text translations, annotated with dialect, age, and gender information of the speakers. The dataset allows for training speech translation, dialect recognition, and speech synthesis systems, among others. The data was collected using a web recording tool that is open to the public. Each participant was given a text in Standard German and asked to translate it to their Swiss German dialect before recording it. To increase the corpus quality, recordings were validated by other participants. The data consists of 200 hours of speech by around 4000 different speakers and covers a large part of the Swiss-German dialect landscape. We release SDS-200 alongside a baseline speech translation model, which achieves a word error rate (WER) of 30.3 and a BLEU score of 53.1 on the SDS-200 test set. Furthermore, we use SDS-200 to fine-tune a pre-trained XLS-R model, achieving 21.6 WER and 64.0 BLEU.

READ FULL TEXT
research
05/30/2023

STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions

We present STT4SG-350 (Speech-to-Text for Swiss German), a corpus of Swi...
research
01/17/2023

2nd Swiss German Speech to Standard German Text Shared Task at SwissText 2022

We present the results and findings of the 2nd Swiss German speech to St...
research
12/15/2014

A Broadcast News Corpus for Evaluation and Tuning of German LVCSR Systems

Transcription of broadcast news is an interesting and challenging applic...
research
12/19/2019

Developing a Multi-Platform Speech Recording System Toward Open Service of Building Large-Scale Speech Corpora

This paper briefly reports our ongoing attempt at the development of a m...
research
08/16/2023

Classifying Dementia in the Presence of Depression: A Cross-Corpus Study

Automated dementia screening enables early detection and intervention, r...
research
07/01/2022

Swiss German Speech to Text system evaluation

We present an in-depth evaluation of four commercially available Speech-...
research
05/31/2023

Text-to-Speech Pipeline for Swiss German – A comparison

In this work, we studied the synthesis of Swiss German speech using diff...

Please sign up or login with your details

Forgot password? Click here to reset