Rongjie Huang

research

∙ 09/14/2023

Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer

Direct speech-to-speech translation (S2ST) with discrete self-supervised...

0 Yongqi Wang, et al. ∙

research

∙ 06/06/2023

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias

Scaling text-to-speech to a large and wild dataset has been proven to be...

0 Ziyue Jiang, et al. ∙

research

∙ 06/04/2023

Detector Guidance for Multi-Object Text-to-Image Generation

Diffusion models have demonstrated impressive performance in text-to-ima...

0 Luping Liu, et al. ∙

research

∙ 05/30/2023

Make-A-Voice: Unified Voice Synthesis With Discrete Representation

Various applications of voice synthesis have been developed independentl...

0 Rongjie Huang, et al. ∙

research

∙ 05/29/2023

Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation

Large diffusion models have been successful in text-to-audio (T2A) synth...

0 Jiawei Huang, et al. ∙

research

∙ 05/24/2023

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation

Direct speech-to-speech translation (S2ST) aims to convert speech from o...

0 Rongjie Huang, et al. ∙

research

∙ 05/23/2023

FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models

Stutter removal is an essential scenario in the field of speech editing....

0 Ziyue Jiang, et al. ∙

research

∙ 05/22/2023

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer

Text-to-speech(TTS) has undergone remarkable improvements in performance...

0 Huadai Liu, et al. ∙

research

∙ 05/21/2023

Wav2SQL: Direct Generalizable Speech-To-SQL Parsing

Speech-to-SQL (S2SQL) aims to convert spoken questions into SQL queries ...

0 Huadai Liu, et al. ∙

research

∙ 05/18/2023

CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training

Improving text representation has attracted much attention to achieve ex...

0 Zhenhui Ye, et al. ∙

research

∙ 05/18/2023

RMSSinger: Realistic-Music-Score based Singing Voice Synthesis

We are interested in a challenging task, Realistic-Music-Score based Sin...

0 Jinzheng He, et al. ∙

research

∙ 05/08/2023

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment

The speech-to-singing (STS) voice conversion task aims to generate singi...

0 Ruiqi Li, et al. ∙

research

∙ 05/04/2023

HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec

Audio codec models are widely used in audio communication as a crucial t...

0 Dongchao Yang, et al. ∙

research

∙ 05/01/2023

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

Generating talking person portraits with arbitrary speech audio is a cru...

8 Zhenhui Ye, et al. ∙

research

∙ 04/25/2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Large language models (LLMs) have exhibited remarkable capabilities acro...

7 Rongjie Huang, et al. ∙

research

∙ 03/09/2023

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition

Multi-media communications facilitate global interaction among people. H...

0 Xize Cheng, et al. ∙

research

∙ 01/31/2023

InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt

Expressive text-to-speech (TTS) aims to synthesize different speaking st...

0 Dongchao Yang, et al. ∙

research

∙ 01/30/2023

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models

Large-scale multimodal generative modeling has created milestones in tex...

1 Rongjie Huang, et al. ∙

research

∙ 11/19/2022

VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement

Video to sound generation aims to generate realistic and natural sound g...

0 Chenye Cui, et al. ∙

research

∙ 07/13/2022

ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech

Denoising diffusion probabilistic models (DDPMs) have recently achieved ...

0 Rongjie Huang, et al. ∙

research

∙ 05/25/2022

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation

Direct speech-to-speech translation (S2ST) systems leverage recent progr...

0 Rongjie Huang, et al. ∙

research

∙ 05/15/2022

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Synthesis

Style transfer for out-of-domain (OOD) speech synthesis aims to generate...

0 Rongjie Huang, et al. ∙

research

∙ 12/20/2021

Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus

High-fidelity multi-singer singing voice synthesis is challenging for ne...

0 Rongjie Huang, et al. ∙

research

∙ 10/14/2021

SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation

High-fidelity singing voice synthesis is challenging for neural vocoders...

0 Feiyang Chen, et al. ∙

research

∙ 06/17/2021

EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model

Recently, there has been an increasing interest in neural speech synthes...

0 Chenye Cui, et al. ∙

Rongjie Huang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro