Self-supervised CNN for Unconstrained 3D Facial Performance Capture from an RGB-D Camera

by   Yudong Guo, et al.

We present a novel method for real-time 3D facial performance capture with consumer-level RGB-D sensors. Our capturing system is targeted at robust and stable 3D face capturing in the wild, in which the RGB-D facial data contain noise, imperfection and occlusion, and often exhibit high variability in motion, pose, expression and lighting conditions, thus posing great challenges. The technical contribution is a self-supervised deep learning framework, which is trained directly from raw RGB-D data. The key novelties include: (1) learning both the core tensor and the parameters for refining our parametric face model; (2) using vertex displacement and UV map for learning surface detail; (3) designing the loss function by incorporating temporal coherence and same identity constraints based on pairs of RGB-D images and utilizing sparse norms, in addition to the conventional terms for photo-consistency, feature similarity, regularization as well as geometry consistency; and (4) augmenting the training data set in new ways. The method is demonstrated in a live setup that runs in real-time on a smartphone and an RGB-D sensor. Extensive experiments show that our method is robust to severe occlusion, fast motion, large rotation, exaggerated facial expressions and diverse lighting.


page 1

page 2

page 5

page 6

page 8

page 11

page 12

page 13


Self-supervised CNN for Unconstrained 3D Facial Performance Capture from a Single RGB-D Camera

We present a novel method for real-time 3D facial performance capture wi...

Real-Time Facial Segmentation and Performance Capture from RGB Input

We introduce the concept of unconstrained real-time 3D facial performanc...

DeepFaceFlow: In-the-wild Dense 3D Facial Motion Estimation

Dense 3D facial motion capture from only monocular in-the-wild pairs of ...

Real-time Simultaneous 3D Head Modeling and Facial Motion Capture with an RGB-D camera

We propose a method to build in real-time animated 3D head models using ...

PFLD: A Practical Facial Landmark Detector

Being accurate, efficient, and compact is essential to a facial landmark...

Real-time 3D Facial Tracking via Cascaded Compositional Learning

We propose to learn a cascade of globally-optimized modular boosted fern...

DeepFlash: Turning a Flash Selfie into a Studio Portrait

We present a method for turning a flash selfie taken with a smartphone i...

Please sign up or login with your details

Forgot password? Click here to reset