Facial Landmark Predictions with Applications to Metaverse

09/29/2022
by   Qiao Han, et al.
0

This research aims to make metaverse characters more realistic by adding lip animations learnt from videos in the wild. To achieve this, our approach is to extend Tacotron 2 text-to-speech synthesizer to generate lip movements together with mel spectrogram in one pass. The encoder and gate layer weights are pre-trained on LJ Speech 1.1 data set while the decoder is retrained on 93 clips of TED talk videos extracted from LRS 3 data set. Our novel decoder predicts displacement in 20 lip landmark positions across time, using labels automatically extracted by OpenFace 2.0 landmark predictor. Training converged in 7 hours using less than 5 minutes of video. We conducted ablation study for Pre/Post-Net and pre-trained encoder weights to demonstrate the effectiveness of transfer learning between audio and visual speech data.

READ FULL TEXT

page 1

page 3

research
05/26/2023

Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation

In end-to-end speech translation, speech and text pre-trained models imp...
research
06/14/2019

Realistic Speech-Driven Facial Animation with GANs

Speech-driven facial animation is the process that automatically synthes...
research
06/07/2023

Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization

End-to-end speech summarization (E2E SSum) directly summarizes input spe...
research
05/23/2018

End-to-End Speech-Driven Facial Animation with Temporal GANs

Speech-driven facial animation is the process which uses speech signals ...
research
06/05/2023

LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading

Lip-to-speech involves generating a natural-sounding speech synchronized...
research
07/06/2023

Performance Comparison of Pre-trained Models for Speech-to-Text in Turkish: Whisper-Small and Wav2Vec2-XLS-R-300M

In this study, the performances of the Whisper-Small and Wav2Vec2-XLS-R-...
research
04/02/2023

Recurrence without Recurrence: Stable Video Landmark Detection with Deep Equilibrium Models

Cascaded computation, whereby predictions are recurrently refined over s...

Please sign up or login with your details

Forgot password? Click here to reset