CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior

01/06/2023
by   Jinbo Xing, et al.
0

Speech-driven 3D facial animation has been widely studied, yet there is still a gap to achieving realism and vividness due to the highly ill-posed nature and scarcity of audio-visual data. Existing works typically formulate the cross-modal mapping into a regression task, which suffers from the regression-to-mean problem leading to over-smoothed facial motions. In this paper, we propose to cast speech-driven facial animation as a code query task in a finite proxy space of the learned codebook, which effectively promotes the vividness of the generated motions by reducing the cross-modal mapping uncertainty. The codebook is learned by self-reconstruction over real facial motions and thus embedded with realistic facial motion priors. Over the discrete motion space, a temporal autoregressive model is employed to sequentially synthesize facial motions from the input speech signal, which guarantees lip-sync as well as plausible facial expressions. We demonstrate that our approach outperforms current state-of-the-art methods both qualitatively and quantitatively. Also, a user study further justifies our superiority in perceptual quality.

READ FULL TEXT

page 7

page 8

page 15

research
04/16/2021

MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement

This paper presents a generic method for generating full facial 3D anima...
research
12/10/2021

FaceFormer: Speech-Driven 3D Facial Animation with Transformers

Speech-driven 3D facial animation is challenging due to the complex geom...
research
12/04/2021

Joint Audio-Text Model for Expressive Speech-Driven 3D Facial Animation

Speech-driven 3D facial animation with accurate lip synchronization has ...
research
01/17/2023

Audio2Gestures: Generating Diverse Gestures from Audio

People may perform diverse gestures affected by various mental and physi...
research
08/15/2021

Audio2Gestures: Generating Diverse Gestures from Speech Audio with Conditional Variational Autoencoders

Generating conversational gestures from speech audio is challenging due ...
research
01/15/2023

Learning Audio-Driven Viseme Dynamics for 3D Face Animation

We present a novel audio-driven facial animation approach that can gener...
research
08/10/2023

Speech-Driven 3D Face Animation with Composite and Regional Facial Movements

Speech-driven 3D face animation poses significant challenges due to the ...

Please sign up or login with your details

Forgot password? Click here to reset