DNN-based Speech Synthesis for Indian Languages from ASCII text

08/18/2016
by   Srikanth Ronanki, et al.
0

Text-to-Speech synthesis in Indian languages has a seen lot of progress over the decade partly due to the annual Blizzard challenges. These systems assume the text to be written in Devanagari or Dravidian scripts which are nearly phonemic orthography scripts. However, the most common form of computer interaction among Indians is ASCII written transliterated text. Such text is generally noisy with many variations in spelling for the same word. In this paper we evaluate three approaches to synthesize speech from such noisy ASCII text: a naive Uni-Grapheme approach, a Multi-Grapheme approach, and a supervised Grapheme-to-Phoneme (G2P) approach. These methods first convert the ASCII text to a phonetic script, and then learn a Deep Neural Network to synthesize speech from that. We train and test our models on Blizzard Challenge datasets that were transliterated to ASCII using crowdsourcing. Our experiments on Hindi, Tamil and Telugu demonstrate that our models generate speech of competetive quality from ASCII text compared to the speech synthesized from the native scripts. All the accompanying transliterated datasets are released for public access.

READ FULL TEXT
research
07/05/2021

Speech Synthesis from Text and Ultrasound Tongue Image-based Articulatory Input

Articulatory information has been shown to be effective in improving the...
research
10/27/2022

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech

This paper proposes Virtuoso, a massively multilingual speech-text joint...
research
04/06/2022

Simple and Effective Unsupervised Speech Synthesis

We introduce the first unsupervised speech synthesis system based on a s...
research
11/23/2018

Learning pronunciation from a foreign language in speech synthesis networks

Although there are more than 65,000 languages in the world, the pronunci...
research
10/08/2020

On the Role of Style in Parsing Speech with Neural Models

The differences in written text and conversational speech are substantia...
research
06/27/2023

Automatic Annotation of Direct Speech in Written French Narratives

The automatic annotation of direct speech (AADS) in written text has bee...
research
08/15/2022

Towards Parametric Speech Synthesis Using Gaussian-Markov Model of Spectral Envelope and Wavelet-Based Decomposition of F0

Neural network-based Text-to-Speech has significantly improved the quali...

Please sign up or login with your details

Forgot password? Click here to reset