Jinglin Liu

research

∙ 08/29/2023

C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model

Co-speech gesture generation is crucial for automatic digital avatar ani...

0 Longbin Ji, et al. ∙

research

∙ 07/14/2023

Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts

Zero-shot text-to-speech aims at synthesizing voices with unseen speech ...

0 Ziyue Jiang, et al. ∙

research

∙ 06/06/2023

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias

Scaling text-to-speech to a large and wild dataset has been proven to be...

0 Ziyue Jiang, et al. ∙

research

∙ 06/06/2023

Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis

We are interested in a novel task, namely low-resource text-to-talking a...

0 Zhenhui Ye, et al. ∙

research

∙ 05/29/2023

Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation

Large diffusion models have been successful in text-to-audio (T2A) synth...

0 Jiawei Huang, et al. ∙

research

∙ 05/24/2023

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation

Direct speech-to-speech translation (S2ST) aims to convert speech from o...

0 Rongjie Huang, et al. ∙

research

∙ 05/18/2023

CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training

Improving text representation has attracted much attention to achieve ex...

0 Zhenhui Ye, et al. ∙

research

∙ 05/18/2023

RMSSinger: Realistic-Music-Score based Singing Voice Synthesis

We are interested in a challenging task, Realistic-Music-Score based Sin...

0 Jinzheng He, et al. ∙

research

∙ 05/08/2023

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment

The speech-to-singing (STS) voice conversion task aims to generate singi...

0 Ruiqi Li, et al. ∙

research

∙ 05/01/2023

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

Generating talking person portraits with arbitrary speech audio is a cru...

8 Zhenhui Ye, et al. ∙

research

∙ 04/25/2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Large language models (LLMs) have exhibited remarkable capabilities acro...

7 Rongjie Huang, et al. ∙

research

∙ 03/24/2023

MUG: A General Meeting Understanding and Generation Benchmark

Listening to long video/audio recordings from video conferencing and onl...

5 Qinglin Zhang, et al. ∙

research

∙ 03/24/2023

Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG)

ICASSP2023 General Meeting Understanding and Generation Challenge (MUG) ...

0 Qinglin Zhang, et al. ∙

research

∙ 01/31/2023

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis

Generating photo-realistic video portrait with arbitrary speech audio is...

3 Zhenhui Ye, et al. ∙

research

∙ 01/30/2023

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models

Large-scale multimodal generative modeling has created milestones in tex...

1 Rongjie Huang, et al. ∙

research

∙ 11/19/2022

VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement

Video to sound generation aims to generate realistic and natural sound g...

0 Chenye Cui, et al. ∙

research

∙ 07/13/2022

ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech

Denoising diffusion probabilistic models (DDPMs) have recently achieved ...

0 Rongjie Huang, et al. ∙

research

∙ 06/05/2022

Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech

Polyphone disambiguation aims to capture accurate pronunciation knowledg...

0 Ziyue Jiang, et al. ∙

research

∙ 05/25/2022

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation

Direct speech-to-speech translation (S2ST) systems leverage recent progr...

0 Rongjie Huang, et al. ∙

research

∙ 05/15/2022

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Synthesis

Style transfer for out-of-domain (OOD) speech synthesis aims to generate...

0 Rongjie Huang, et al. ∙

research

∙ 02/27/2022

Learning the Beauty in Songs: Neural Singing Voice Beautifier

We are interested in a novel task, singing voice beautifying (SVB). Give...

18 Jinglin Liu, et al. ∙

research

∙ 01/11/2022

MR-SVS: Singing Voice Synthesis with Multi-Reference Encoder

Multi-speaker singing voice synthesis is to generate the singing voice s...

0 Shoutong Wang, et al. ∙

research

∙ 12/20/2021

Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus

High-fidelity multi-singer singing voice synthesis is challenging for ne...

0 Rongjie Huang, et al. ∙

research

∙ 12/08/2021

SimulSLT: End-to-End Simultaneous Sign Language Translation

Sign language translation as a kind of technology with profound social s...

0 Aoxiong Yin, et al. ∙

research

∙ 10/14/2021

SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation

High-fidelity singing voice synthesis is challenging for neural vocoders...

0 Feiyang Chen, et al. ∙

research

∙ 09/30/2021

PortaSpeech: Portable and High-Quality Generative Text-to-Speech

Non-autoregressive text-to-speech (NAR-TTS) models such as FastSpeech 2 ...

0 Yi Ren, et al. ∙

research

∙ 08/31/2021

SimulLR: Simultaneous Lip Reading Transducer with Attention-Guided Adaptive Memory

Lip reading, aiming to recognize spoken sentences according to the given...

0 Zhijie Lin, et al. ∙

research

∙ 07/14/2021

High-Speed and High-Quality Text-to-Lip Generation

As a key component of talking face generation, lip movements generation ...

0 Jinglin Liu, et al. ∙

research

∙ 06/17/2021

EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model

Recently, there has been an increasing interest in neural speech synthes...

0 Chenye Cui, et al. ∙

research

∙ 05/06/2021

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism

Singing voice synthesis (SVS) system is built to synthesize high-quality...

12 Jinglin Liu, et al. ∙

research

∙ 12/17/2020

DenoiSpeech: Denoising Text to Speech with Frame-Level Noise Modeling

While neural-based text to speech (TTS) models can synthesize natural an...

0 Chen Zhang, et al. ∙

research

∙ 08/06/2020

FastLR: Non-Autoregressive Lipreading Model with Integrate-and-Fire

Lipreading is an impressive technique and there has been a definite impr...

7 Jinglin Liu, et al. ∙

research

∙ 07/17/2020

Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation

Non-autoregressive translation (NAT) achieves faster inference speed but...

0 Jinglin Liu, et al. ∙

research

∙ 04/22/2020

A Study of Non-autoregressive Model for Sequence Generation

Non-autoregressive (NAR) models generate all the tokens of a sequence in...

0 Yi Ren, et al. ∙

Jinglin Liu

Featured Co-authors

Sign in with Google

Consider DeepAI Pro