Zhenhui Ye

research

∙ 07/14/2023

Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts

Zero-shot text-to-speech aims at synthesizing voices with unseen speech ...

0 Ziyue Jiang, et al. ∙

research

∙ 06/06/2023

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias

Scaling text-to-speech to a large and wild dataset has been proven to be...

0 Ziyue Jiang, et al. ∙

research

∙ 06/06/2023

Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis

We are interested in a novel task, namely low-resource text-to-talking a...

0 Zhenhui Ye, et al. ∙

research

∙ 05/30/2023

Make-A-Voice: Unified Voice Synthesis With Discrete Representation

Various applications of voice synthesis have been developed independentl...

0 Rongjie Huang, et al. ∙

research

∙ 05/29/2023

Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation

Large diffusion models have been successful in text-to-audio (T2A) synth...

0 Jiawei Huang, et al. ∙

research

∙ 05/24/2023

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation

Direct speech-to-speech translation (S2ST) aims to convert speech from o...

0 Rongjie Huang, et al. ∙

research

∙ 05/23/2023

FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models

Stutter removal is an essential scenario in the field of speech editing....

0 Ziyue Jiang, et al. ∙

research

∙ 05/18/2023

CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training

Improving text representation has attracted much attention to achieve ex...

0 Zhenhui Ye, et al. ∙

research

∙ 05/18/2023

RMSSinger: Realistic-Music-Score based Singing Voice Synthesis

We are interested in a challenging task, Realistic-Music-Score based Sin...

0 Jinzheng He, et al. ∙

research

∙ 05/01/2023

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

Generating talking person portraits with arbitrary speech audio is a cru...

8 Zhenhui Ye, et al. ∙

research

∙ 04/25/2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Large language models (LLMs) have exhibited remarkable capabilities acro...

7 Rongjie Huang, et al. ∙

research

∙ 01/31/2023

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis

Generating photo-realistic video portrait with arbitrary speech audio is...

3 Zhenhui Ye, et al. ∙

research

∙ 01/30/2023

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models

Large-scale multimodal generative modeling has created milestones in tex...

1 Rongjie Huang, et al. ∙

research

∙ 06/05/2022

Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech

Polyphone disambiguation aims to capture accurate pronunciation knowledg...

0 Ziyue Jiang, et al. ∙

research

∙ 04/25/2022

SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech

The recent progress in non-autoregressive text-to-speech (NAR-TTS) has m...

0 Zhenhui Ye, et al. ∙

research

∙ 09/05/2021

Soft Hierarchical Graph Recurrent Networks for Many-Agent Partially Observable Environments

The recent progress in multi-agent deep reinforcement learning(MADRL) ma...

0 Zhenhui Ye, et al. ∙

research

∙ 05/19/2020

Experience Augmentation: Boosting and Accelerating Off-Policy Multi-Agent Reinforcement Learning

Exploration of the high-dimensional state action space is one of the big...

0 Zhenhui Ye, et al. ∙

Zhenhui Ye

Featured Co-authors

Sign in with Google

Consider DeepAI Pro