Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder

04/06/2018
by   Kei Akuzawa, et al.
0

Recent advances in neural autoregressive models have improve the performance of speech synthesis (SS). However, as they lack the ability to model global characteristics of speech (such as speaker individualities or speaking styles), particularly when these characteristics have not been labeled, making neural autoregressive SS systems more expressive is still an open issue. In this paper, we propose to combine VoiceLoop, an autoregressive SS model, with Variational Autoencoder (VAE). This approach, unlike traditional autoregressive SS systems, uses VAE to model the global characteristics explicitly, enabling the expressiveness of the synthesized speech to be controlled in an unsupervised manner. Experiments using the VCTK and Blizzard2012 datasets show the VAE helps VoiceLoop to generate higher quality speech and to control the expressions in its synthesized speech by incorporating global characteristics into the speech generating process.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2022

Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech

This paper proposes a hierarchical and multi-scale variational autoencod...
research
08/25/2023

Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder

Neural networks have been able to generate high-quality single-sentence ...
research
11/02/2022

Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis

A large part of the expressive speech synthesis literature focuses on le...
research
07/20/2023

SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer

Expressive speech synthesis models are trained by adding corpora with di...
research
04/28/2021

Learning deep autoregressive models for hierarchical data

We propose a model for hierarchical structured data as an extension to t...
research
12/03/2017

Spatial PixelCNN: Generating Images from Patches

In this paper we propose Spatial PixelCNN, a conditional autoregressive ...
research
05/09/2022

ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence

The Gumbel-softmax distribution, or Concrete distribution, is often used...

Please sign up or login with your details

Forgot password? Click here to reset