Tacotron: Towards End-to-End Speech Synthesis

03/29/2017
by   Yuxuan Wang, et al.
0

A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module. Building these components often requires extensive domain expertise and may contain brittle design choices. In this paper, we present Tacotron, an end-to-end generative text-to-speech model that synthesizes speech directly from characters. Given <text, audio> pairs, the model can be trained completely from scratch with random initialization. We present several key techniques to make the sequence-to-sequence framework perform well for this challenging task. Tacotron achieves a 3.82 subjective 5-scale mean opinion score on US English, outperforming a production parametric system in terms of naturalness. In addition, since Tacotron generates speech at the frame level, it's substantially faster than sample-level autoregressive methods.

READ FULL TEXT

page 6

page 7

research
08/01/2021

End to End Bangla Speech Synthesis

Text-to-Speech (TTS) system is a system where speech is synthesized from...
research
12/16/2017

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

This paper describes Tacotron 2, a neural network architecture for speec...
research
11/11/2019

A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis

In Mandarin text-to-speech (TTS) system, the front-end text processing m...
research
07/22/2020

A Transfer Learning End-to-End ArabicText-To-Speech (TTS) Deep Architecture

Speech synthesis is the artificial production of human speech. A typical...
research
06/05/2020

End-to-End Adversarial Text-to-Speech

Modern text-to-speech synthesis pipelines typically involve multiple pro...
research
04/08/2021

Flavored Tacotron: Conditional Learning for Prosodic-linguistic Features

Neural sequence-to-sequence text-to-speech synthesis (TTS), such as Taco...
research
03/04/2020

GraphTTS: graph-to-sequence modelling in neural text-to-speech

This paper leverages the graph-to-sequence method in neural text-to-spee...

Please sign up or login with your details

Forgot password? Click here to reset