Direct speech-to-speech translation (S2ST) with discrete self-supervised...
Scaling text-to-speech to a large and wild dataset has been proven to be...
Diffusion models have demonstrated impressive performance in text-to-ima...
Various applications of voice synthesis have been developed independentl...
Large diffusion models have been successful in text-to-audio (T2A) synth...
Direct speech-to-speech translation (S2ST) aims to convert speech from o...
Stutter removal is an essential scenario in the field of speech editing....
Text-to-speech(TTS) has undergone remarkable improvements in performance...
Speech-to-SQL (S2SQL) aims to convert spoken questions into SQL queries ...
Improving text representation has attracted much attention to achieve
ex...
We are interested in a challenging task, Realistic-Music-Score based Sin...
The speech-to-singing (STS) voice conversion task aims to generate singi...
Audio codec models are widely used in audio communication as a crucial
t...
Generating talking person portraits with arbitrary speech audio is a cru...
Large language models (LLMs) have exhibited remarkable capabilities acro...
Multi-media communications facilitate global interaction among people.
H...
Expressive text-to-speech (TTS) aims to synthesize different speaking st...
Large-scale multimodal generative modeling has created milestones in
tex...
Video to sound generation aims to generate realistic and natural sound g...
Denoising diffusion probabilistic models (DDPMs) have recently achieved
...
Direct speech-to-speech translation (S2ST) systems leverage recent progr...
Style transfer for out-of-domain (OOD) speech synthesis aims to generate...
High-fidelity multi-singer singing voice synthesis is challenging for ne...
High-fidelity singing voice synthesis is challenging for neural vocoders...
Recently, there has been an increasing interest in neural speech synthes...