Low-Resource Mongolian Speech Synthesis Based on Automatic Prosody Annotation

11/17/2022
by   Xin Yuan, et al.
0

While deep learning-based text-to-speech (TTS) models such as VITS have shown excellent results, they typically require a sizable set of high-quality <text, audio> pairs to train, which is expensive to collect. So far, most languages in the world still lack the training data needed to develop TTS systems. This paper proposes two improvement methods for the two problems faced by low-resource Mongolian speech synthesis: a) In view of the lack of high-quality <text, audio> pairs of data, it is difficult to model the mapping problem from linguistic features to acoustic features. Improvements are made using pre-trained VITS model and transfer learning methods. b) In view of the problem of less labeled information, this paper proposes to use an automatic prosodic annotation method to label the prosodic information of text and corresponding speech, thereby improving the naturalness and intelligibility of low-resource Mongolian language. Through empirical research, the N-MOS of the method proposed in this paper is 4.195, and the I-MOS is 4.228.

READ FULL TEXT
research
07/20/2022

When Is TTS Augmentation Through a Pivot Language Useful?

Developing Automatic Speech Recognition (ASR) for low-resource languages...
research
05/20/2023

ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource Scenarios

Text to Speech (TTS) models can generate natural and high-quality speech...
research
10/17/2022

Transferring Knowledge via Neighborhood-Aware Optimal Transport for Low-Resource Hate Speech Detection

The concerning rise of hateful content on online platforms has increased...
research
08/02/2019

SANTLR: Speech Annotation Toolkit for Low Resource Languages

While low resource speech recognition has attracted a lot of attention f...
research
02/13/2023

Fast and small footprint Hybrid HMM-HiFiGAN based system for speech synthesis in Indian languages

Hidden-Markov-model (HMM) based text-to-speech (HTS) offers flexibility ...
research
08/30/2018

Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis

Although end-to-end text-to-speech (TTS) models such as Tacotron have sh...
research
03/05/2021

Transfer Learning based Speech Affect Recognition in Urdu

It has been established that Speech Affect Recognition for low resource ...

Please sign up or login with your details

Forgot password? Click here to reset