Fusing Monocular Images and Sparse IMU Signals for Real-time Human Motion Capture

by   Shaohua Pan, et al.
Tsinghua University

Either RGB images or inertial signals have been used for the task of motion capture (mocap), but combining them together is a new and interesting topic. We believe that the combination is complementary and able to solve the inherent difficulties of using one modality input, including occlusions, extreme lighting/texture, and out-of-view for visual mocap and global drifts for inertial mocap. To this end, we propose a method that fuses monocular images and sparse IMUs for real-time human motion capture. Our method contains a dual coordinate strategy to fully explore the IMU signals with different goals in motion capture. To be specific, besides one branch transforming the IMU signals to the camera coordinate system to combine with the image information, there is another branch to learn from the IMU signals in the body root coordinate system to better estimate body poses. Furthermore, a hidden state feedback mechanism is proposed for both two branches to compensate for their own drawbacks in extreme input cases. Thus our method can easily switch between the two kinds of signals or combine them in different cases to achieve a robust mocap. divided parts can help each other for better mocap results under different conditions. Quantitative and qualitative results demonstrate that by delicately designing the fusion method, our technique significantly outperforms the state-of-the-art vision, IMU, and combined methods on both global orientation and local pose estimation. Our codes are available for research at https://shaohua-pan.github.io/robustcap-page/.


page 1

page 4

page 6

page 7

page 10


TransPose: Real-time 3D Human Translation and Pose Estimation with Six Inertial Sensors

Motion capture is facing some new possibilities brought by the inertial ...

EgoLocate: Real-time Motion Capture, Localization, and Mapping with Sparse Body-mounted Sensors

Human and environment sensing are two important topics in Computer Visio...

HybridCap: Inertia-aid Monocular Capture of Challenging Human Motions

Monocular 3D motion capture (mocap) is beneficial to many applications. ...

XFormer: Fast and Accurate Monocular 3D Body Capture

We present XFormer, a novel human mesh and motion capture method that ac...

Camera Motion Agnostic 3D Human Pose Estimation

Although the performance of 3D human pose and shape estimation methods h...

Capturing Detailed Deformations of Moving Human Bodies

We present a new method to capture detailed human motion, sampling more ...

Please sign up or login with your details

Forgot password? Click here to reset