State-of-the-art text-to-speech (TTS) systems have utilized pretrained
l...
We present a scalable method to produce high quality emphasis for
text-t...
We present eCat, a novel end-to-end multispeaker model capable of: a)
ge...
Generating expressive and contextually appropriate prosody remains a
cha...
Duration modelling has become an important research problem once more wi...
In this paper, we present CopyCat2 (CC2), a novel model capable of: a)
s...
This paper presents a novel data augmentation technique for text-to-spee...
We propose a novel Multi-Scale Spectrogram (MSS) modelling approach to
s...
Many factors influence speech yielding different renditions of a given
s...
Recently the state-of-the-art text-to-speech synthesis systems have shif...
In this paper, we introduce Kathaka, a model trained with a novel two-st...
This paper addresses the problem of estimating the voice source directly...
Prosody Transfer (PT) is a technique that aims to use the prosody from a...
This paper proposes a method to improve the quality delivered by statist...
We present a novel system for singing synthesis, based on attention. Sta...
We present an approach to synthesize whisper by applying a handcrafted s...
Pitch detection is a fundamental problem in speech processing as F0 is u...
Statistical TTS systems that directly predict the speech waveform have
r...
The 11th Summer Workshop on Multimodal Interfaces eNTERFACE 2015 was hos...