Speech-Driven 3D Face Animation with Composite and Regional Facial Movements

by   Haozhe Wu, et al.

Speech-driven 3D face animation poses significant challenges due to the intricacy and variability inherent in human facial movements. This paper emphasizes the importance of considering both the composite and regional natures of facial movements in speech-driven 3D face animation. The composite nature pertains to how speech-independent factors globally modulate speech-driven facial movements along the temporal dimension. Meanwhile, the regional nature alludes to the notion that facial movements are not globally correlated but are actuated by local musculature along the spatial dimension. It is thus indispensable to incorporate both natures for engendering vivid animation. To address the composite nature, we introduce an adaptive modulation module that employs arbitrary facial movements to dynamically adjust speech-driven facial movements across frames on a global scale. To accommodate the regional nature, our approach ensures that each constituent of the facial features for every frame focuses on the local spatial movements of 3D faces. Moreover, we present a non-autoregressive backbone for translating audio to 3D facial movements, which maintains high-frequency nuances of facial movements and facilitates efficient inference. Comprehensive experiments and user studies demonstrate that our method surpasses contemporary state-of-the-art approaches both qualitatively and quantitatively.


page 6

page 7

page 8


Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations

Audio-driven talking face generation, which aims to synthesize talking f...

Modality Dropout for Improved Performance-driven Talking Faces

We describe our novel deep learning approach for driving animated faces ...

Audio2Face: Generating Speech/Face Animation from Single Audio with Attention-Based Bidirectional LSTM Networks

We propose an end to end deep learning approach for generating real-time...

CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior

Speech-driven 3D facial animation has been widely studied, yet there is ...

Lip movements information disentanglement for lip sync

The lip movements information is critical for many audio-visual tasks. H...

Stuttering Speech Disfluency Prediction using Explainable Attribution Vectors of Facial Muscle Movements

Speech disorders such as stuttering disrupt the normal fluency of speech...

GAN-based Deidentification of Drivers' Face Videos: An Assessment of Human Factors Implications in NDS Data

This paper addresses the problem of sharing drivers' face videos for tra...

Please sign up or login with your details

Forgot password? Click here to reset