Zero-shot text-to-speech aims at synthesizing voices with unseen speech
...
Scaling text-to-speech to a large and wild dataset has been proven to be...
We are interested in a novel task, namely low-resource text-to-talking
a...
Various applications of voice synthesis have been developed independentl...
Large diffusion models have been successful in text-to-audio (T2A) synth...
Direct speech-to-speech translation (S2ST) aims to convert speech from o...
Stutter removal is an essential scenario in the field of speech editing....
Improving text representation has attracted much attention to achieve
ex...
We are interested in a challenging task, Realistic-Music-Score based Sin...
Generating talking person portraits with arbitrary speech audio is a cru...
Large language models (LLMs) have exhibited remarkable capabilities acro...
Generating photo-realistic video portrait with arbitrary speech audio is...
Large-scale multimodal generative modeling has created milestones in
tex...
Polyphone disambiguation aims to capture accurate pronunciation knowledg...
The recent progress in non-autoregressive text-to-speech (NAR-TTS) has m...
The recent progress in multi-agent deep reinforcement learning(MADRL) ma...
Exploration of the high-dimensional state action space is one of the big...