Auto-CARD: Efficient and Robust Codec Avatar Driving for Real-time Mobile Telepresence

by   Yonggan Fu, et al.

Real-time and robust photorealistic avatars for telepresence in AR/VR have been highly desired for enabling immersive photorealistic telepresence. However, there still exists one key bottleneck: the considerable computational expense needed to accurately infer facial expressions captured from headset-mounted cameras with a quality level that can match the realism of the avatar's human appearance. To this end, we propose a framework called Auto-CARD, which for the first time enables real-time and robust driving of Codec Avatars when exclusively using merely on-device computing resources. This is achieved by minimizing two sources of redundancy. First, we develop a dedicated neural architecture search technique called AVE-NAS for avatar encoding in AR/VR, which explicitly boosts both the searched architectures' robustness in the presence of extreme facial expressions and hardware friendliness on fast evolving AR/VR headsets. Second, we leverage the temporal redundancy in consecutively captured images during continuous rendering and develop a mechanism dubbed LATEX to skip the computation of redundant frames. Specifically, we first identify an opportunity from the linearity of the latent space derived by the avatar decoder and then propose to perform adaptive latent extrapolation for redundant frames. For evaluation, we demonstrate the efficacy of our Auto-CARD framework in real-time Codec Avatar driving settings, where we achieve a 5.05x speed-up on Meta Quest 2 while maintaining a comparable or even better animation quality than state-of-the-art avatar encoder designs.


page 4

page 5

page 6

page 7

page 8


RT-NeRF: Real-Time On-Device Neural Radiance Fields Towards Immersive AR/VR Rendering

Neural Radiance Field (NeRF) based rendering has attracted growing atten...

Stereo Matching in Time: 100+ FPS Video Stereo Matching for Extended Reality

Real-time Stereo Matching is a cornerstone algorithm for many Extended R...

Instant-3D: Instant Neural Radiance Field Training Towards On-Device AR/VR 3D Reconstruction

Neural Radiance Field (NeRF) based 3D reconstruction is highly desirable...

WebAssembly enables low latency interoperable augmented and virtual reality software

There is a clear difference in runtime performance between native applic...

RT-DNAS: Real-time Constrained Differentiable Neural Architecture Search for 3D Cardiac Cine MRI Segmentation

Accurately segmenting temporal frames of cine magnetic resonance imaging...

Facial Reenactment Through a Personalized Generator

In recent years, the role of image generative models in facial reenactme...

Driving-Signal Aware Full-Body Avatars

We present a learning-based method for building driving-signal aware ful...

Please sign up or login with your details

Forgot password? Click here to reset