On-device neural speech synthesis

09/17/2021
by   Sivanand Achanta, et al.
16

Recent advances in text-to-speech (TTS) synthesis, such as Tacotron and WaveRNN, have made it possible to construct a fully neural network based TTS system, by coupling the two components together. Such a system is conceptually simple as it only takes grapheme or phoneme input, uses Mel-spectrogram as an intermediate feature, and directly generates speech samples. The system achieves quality equal or close to natural speech. However, the high computational cost of the system and issues with robustness have limited their usage in real-world speech synthesis applications and products. In this paper, we present key modeling improvements and optimization strategies that enable deploying these models, not only on GPU servers, but also on mobile devices. The proposed system can generate high-quality 24 kHz speech at 5x faster than real time on server and 3x faster than real time on mobile devices.

READ FULL TEXT
research
02/22/2022

Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet

Neural speech synthesis models can synthesize high quality speech but ty...
research
10/28/2018

LPCNet: Improving Neural Speech Synthesis Through Linear Prediction

Neural speech synthesis models have recently demonstrated the ability to...
research
11/09/2020

FUN! Fast, Universal, Non-Semantic Speech Embeddings

Learned speech representations can drastically improve performance on ta...
research
03/06/2022

Variational Auto-Encoder based Mandarin Speech Cloning

Speech cloning technology is becoming more sophisticated thanks to the a...
research
12/07/2021

Training end-to-end speech-to-text models on mobile phones

Training the state-of-the-art speech-to-text (STT) models in mobile devi...
research
03/28/2019

A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet

Neural speech synthesis algorithms are a promising new approach for codi...
research
07/03/2023

Squeezing Large-Scale Diffusion Models for Mobile

The emergence of diffusion models has greatly broadened the scope of hig...

Please sign up or login with your details

Forgot password? Click here to reset