Kinematic-Structure-Preserved Representation for Unsupervised 3D Human Pose Estimation

by   Jogendra Nath Kundu, et al.

Estimation of 3D human pose from monocular image has gained considerable attention, as a key step to several human-centric applications. However, generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable, as these models often perform unsatisfactorily on unseen in-the-wild environments. Though weakly-supervised models have been proposed to address this shortcoming, performance of such models relies on availability of paired supervision on some related tasks, such as 2D pose or multi-view image pairs. In contrast, we propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions. Our pose estimation framework relies on a minimal set of prior knowledge that defines the underlying kinematic 3D structure, such as skeletal joint connectivity information with bone-length ratios in a fixed canonical scale. The proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation. This design not only acts as a suitable bottleneck stimulating effective pose disentanglement but also yields interpretable latent pose representations avoiding training of an explicit latent embedding to pose mapper. Furthermore, devoid of unstable adversarial setup, we re-utilize the decoder to formalize an energy-based loss, which enables us to learn from in-the-wild videos, beyond laboratory settings. Comprehensive experiments demonstrate our state-of-the-art unsupervised and weakly-supervised pose estimation performance on both Human3.6M and MPI-INF-3DHP datasets. Qualitative results on unseen environments further establish our superior generalization ability.


page 4

page 5

page 7


Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis

Camera captured human pose is an outcome of several sources of variation...

Weakly-supervised Pre-training for 3D Human Pose Estimation via Perspective Knowledge

Modern deep learning-based 3D pose estimation approaches require plenty ...

Error Bounds of Projection Models in Weakly Supervised 3D Human Pose Estimation

The current state-of-the-art in monocular 3D human pose estimation is he...

Aligning Silhouette Topology for Self-Adaptive 3D Human Pose Recovery

Articulation-centric 2D/3D pose supervision forms the core training obje...

3D Human Pose Estimation under limited supervision using Metric Learning

Estimating 3D human pose from monocular images demands large amounts of ...

Learning Transferable Kinematic Dictionary for 3D Human Pose and Shape Reconstruction

Estimating 3D human pose and shape from a single image is highly under-c...

Please sign up or login with your details

Forgot password? Click here to reset