Speech Representations and Phoneme Classification for Preserving the Endangered Language of Ladin

08/27/2021
by   Zane Durante, et al.
0

A vast majority of the world's 7,000 spoken languages are predicted to become extinct within this century, including the endangered language of Ladin from the Italian Alps. Linguists who work to preserve a language's phonetic and phonological structure can spend hours transcribing each minute of speech from native speakers. To address this problem in the context of Ladin, our paper presents the first analysis of speech representations and machine learning models for classifying 32 phonemes of Ladin. We experimented with a novel dataset of the Fascian dialect of Ladin, collected from native speakers in Italy. We created frame-level and segment-level speech feature extraction approaches and conducted extensive experiments with 8 different classifiers trained on 9 different speech representations. Our speech representations ranged from traditional features (MFCC, LPC) to features learned with deep neural network models (autoencoders, LSTM autoencoders, and WaveNet). Our highest-performing classifier, trained on MFCC representations of speech signals, achieved an 86 obtained average accuracies above 77 Our findings contribute insights for learning discriminative Ladin phoneme representations and demonstrate the potential for leveraging machine learning and speech signal processing to preserve Ladin and other endangered languages.

READ FULL TEXT
research
07/22/2017

Native Language Identification on Text and Speech

This paper presents an ensemble system combining the output of multiple ...
research
12/15/2021

Speech frame implementation for speech analysis and recognition

Distinctive features of the created speech frame are: the ability to tak...
research
03/13/2023

Speech Intelligibility Classifiers from 550k Disordered Speech Samples

We developed dysarthric speech intelligibility classifiers on 551,176 di...
research
08/31/2018

Speaker Fluency Level Classification Using Machine Learning Techniques

Level assessment for foreign language students is necessary for putting ...
research
01/23/2018

Cyber Hate Classification: 'Othering' Language And Paragraph Embedding

Hateful and offensive language (also known as hate speech or cyber hate)...
research
04/15/2020

Analyzing analytical methods: The case of phonology in neural models of spoken language

Given the fast development of analysis techniques for NLP and speech pro...
research
12/12/2019

Shaping representations through communication: community size effect in artificial learning systems

Motivated by theories of language and communication that explain why com...

Please sign up or login with your details

Forgot password? Click here to reset