SylNet: An Adaptable End-to-End Syllable Count Estimator for Speech

06/24/2019
by   Shreyas Seshadri, et al.
0

Automatic syllable count estimation (SCE) is used in a variety of applications ranging from speaking rate estimation to detecting social activity from wearable microphones or developmental research concerned with quantifying speech heard by language-learning children in different environments. The majority of previously utilized SCE methods have relied on heuristic DSP methods, and only a small number of bi-directional long short-term memory (BLSTM) approaches have made use of modern machine learning approaches in the SCE task. This paper presents a novel end-to-end method called SylNet for automatic syllable counting from speech, built on the basis of a recent developments in neural network architectures. We describe how the entire model can be optimized directly to minimize SCE error on the training data without annotations aligned at the syllable level, and how it can be adapted to new languages using limited speech data with known syllable counts. Experiments on several different languages reveal that SylNet generalizes to languages beyond its training data and further improves with adaptation. It also outperforms several previously proposed methods for syllabification, including end-to-end BLSTMs.

READ FULL TEXT

page 1

page 4

research
07/23/2018

Automatic Speech Recognition for Humanitarian Applications in Somali

We present our first efforts in building an automatic speech recognition...
research
05/13/2021

Exploring CTC Based End-to-End Techniques for Myanmar Speech Recognition

In this work, we explore a Connectionist Temporal Classification (CTC) b...
research
12/08/2015

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

We show that an end-to-end deep learning approach can be used to recogni...
research
04/19/2016

Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss

Bidirectional long short-term memory (bi-LSTM) networks have recently pr...
research
09/23/2020

FluentNet: End-to-End Detection of Speech Disfluency with Deep Learning

Strong presentation skills are valuable and sought-after in workplace an...
research
04/03/2018

Contrastive Learning of Emoji-based Representations for Resource-Poor Languages

The introduction of emojis (or emoticons) in social media platforms has ...
research
09/11/2020

RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications

Deep learning enables the development of efficient end-to-end speech pro...

Please sign up or login with your details

Forgot password? Click here to reset