DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model

01/24/2023
by   Fan Zhang, et al.
0

Speech-driven gesture synthesis is a field of growing interest in virtual human creation. However, a critical challenge is the inherent intricate one-to-many mapping between speech and gestures. Previous studies have explored and achieved significant progress with generative models. Notwithstanding, most synthetic gestures are still vastly less natural. This paper presents DiffMotion, a novel speech-driven gesture synthesis architecture based on diffusion models. The model comprises an autoregressive temporal encoder and a denoising diffusion probability Module. The encoder extracts the temporal context of the speech input and historical gestures. The diffusion module learns a parameterized Markov chain to gradually convert a simple distribution into a complex distribution and generates the gestures according to the accompanied speech. Compared with baselines, objective and subjective evaluations confirm that our approach can produce natural and diverse gesticulation and demonstrate the benefits of diffusion-based models on speech-driven gesture synthesis.

READ FULL TEXT

page 1

page 5

page 9

research
05/08/2023

DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models

The art of communication beyond speech there are gestures. The automatic...
research
06/15/2023

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

With read-aloud speech synthesis achieving high naturalness scores, ther...
research
09/11/2023

Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation

This paper describes a system developed for the GENEA (Generation and Ev...
research
07/31/2021

Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning

We present a generative adversarial network to synthesize 3D pose sequen...
research
06/20/2023

EMoG: Synthesizing Emotive Co-speech 3D Gesture with Diffusion Model

Although previous co-speech gesture generation methods are able to synth...
research
02/02/2021

SPEAK WITH YOUR HANDS Using Continuous Hand Gestures to control Articulatory Speech Synthesizer

This work presents our advancements in controlling an articulatory speec...
research
08/11/2023

Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model

The generation of co-speech gestures for digital humans is an emerging a...

Please sign up or login with your details

Forgot password? Click here to reset