TuGeBiC: A Turkish German Bilingual Code-Switching Corpus

In this paper we describe the process of collection, transcription, and annotation of recordings of spontaneous speech samples from Turkish-German bilinguals, and the compilation of a corpus called TuGeBiC. Participants in the study were adult Turkish-German bilinguals living in Germany or Turkey at the time of recording in the first half of the 1990s. The data were manually tokenised and normalised, and all proper names (names of participants and places mentioned in the conversations) were replaced with pseudonyms. Token-level automatic language identification was performed, which made it possible to establish the proportions of words from each language. The corpus is roughly balanced between both languages. We also present quantitative information about the number of code-switches, and give examples of different types of code-switching found in the data. The resulting corpus has been made freely available to the research community.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2020

Swiss Parliaments Corpus, an Automatically Aligned Swiss German Speech to Standard German Text Corpus

We present a forced sentence alignment procedure for Swiss German speech...
research
04/03/2019

Subword-Level Language Identification for Intra-Word Code-Switching

Language identification for code-switching (CS), the phenomenon of alter...
research
09/02/2022

A New Aligned Simple German Corpus

"Leichte Sprache", the German counterpart to Simple English, is a regula...
research
03/21/2021

SwissDial: Parallel Multidialectal Corpus of Spoken Swiss German

Swiss German is a dialect continuum whose natively acquired dialects sig...
research
07/15/2019

Joint Language Identification of Code-Switching Speech using Attention based E2E Network

Language identification (LID) has relevance in many speech processing ap...
research
08/16/2022

TexPrax: A Messaging Application for Ethical, Real-time Data Collection and Annotation

Collecting and annotating task-oriented dialog data is difficult, especi...

Please sign up or login with your details

Forgot password? Click here to reset