Photorealistic Lip Sync with Adversarial Temporal Convolutional Networks

02/20/2020
by   Ruobing Zheng, et al.
11

Lip sync has emerged as a promising technique to generate mouth movements on a talking head. However, synthesizing a clear, accurate and human-like performance is still challenging. In this paper, we present a novel lip-sync solution for producing a high-quality and photorealistic talking head from speech. We focus on capturing the specific lip movement and talking style of the target person. We model the seq-to-seq mapping from audio signals to mouth features by two adversarial temporal convolutional networks. Experiments show our model outperforms traditional RNN-based baselines in both accuracy and speed. We also propose an image-to-image translation-based approach for generating high-resolution photoreal face appearance from synthetic facial maps. This fully-trainable framework not only avoids the cumbersome steps like candidate-frame selection in graphics-based rendering methods but also solves some existing issues in recent neural network-based solutions. Our work will benefit related applications such as conversational agent, virtual anchor, tele-presence and gaming.

READ FULL TEXT

page 2

page 3

page 4

page 6

page 8

research
05/23/2018

End-to-End Speech-Driven Facial Animation with Temporal GANs

Speech-driven facial animation is the process which uses speech signals ...
research
09/22/2021

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation

To the best of our knowledge, we first present a live system that genera...
research
03/29/2017

Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation

It has been recently shown that neural networks can recover the geometri...
research
10/19/2021

Talking Head Generation with Audio and Speech Related Facial Action Units

The task of talking head generation is to synthesize a lip synchronized ...
research
10/26/2019

Image to Image Translation based on Convolutional Neural Network Approach for Speech Declipping

Clipping, as a current nonlinear distortion, often occurs due to the lim...
research
09/22/2022

VToonify: Controllable High-Resolution Portrait Video Style Transfer

Generating high-quality artistic portrait videos is an important and des...
research
08/20/2019

A Neural Virtual Anchor Synthesizer based on Seq2Seq and GAN Models

This paper presents a novel framework to generate realistic face video o...

Please sign up or login with your details

Forgot password? Click here to reset