EgoLocate: Real-time Motion Capture, Localization, and Mapping with Sparse Body-mounted Sensors

by   Xinyu Yi, et al.
Tsinghua University

Human and environment sensing are two important topics in Computer Vision and Graphics. Human motion is often captured by inertial sensors, while the environment is mostly reconstructed using cameras. We integrate the two techniques together in EgoLocate, a system that simultaneously performs human motion capture (mocap), localization, and mapping in real time from sparse body-mounted sensors, including 6 inertial measurement units (IMUs) and a monocular phone camera. On one hand, inertial mocap suffers from large translation drift due to the lack of the global positioning signal. EgoLocate leverages image-based simultaneous localization and mapping (SLAM) techniques to locate the human in the reconstructed scene. On the other hand, SLAM often fails when the visual feature is poor. EgoLocate involves inertial mocap to provide a strong prior for the camera motion. Experiments show that localization, a key challenge for both two fields, is largely improved by our technique, compared with the state of the art of the two fields. Our codes are available for research at


page 1

page 4

page 8

page 11

page 14


Physical Inertial Poser (PIP): Physics-aware Real-time Human Motion Tracking from Sparse Inertial Sensors

Motion capture from sparse inertial sensors has shown great potential co...

Probabilistic Spatial Distribution Prior Based Attentional Keypoints Matching Network

Keypoints matching is a pivotal component for many image-relevant applic...

Fusing Monocular Images and Sparse IMU Signals for Real-time Human Motion Capture

Either RGB images or inertial signals have been used for the task of mot...

Diffusion Inertial Poser: Human Motion Reconstruction from Arbitrary Sparse IMU Configurations

Motion capture from a limited number of inertial measurement units (IMUs...

Extended Preintegration for Relative State Estimation of Leader-Follower Platform

Relative state estimation using exteroceptive sensors suffers from limit...

Transformer Inertial Poser: Attention-based Real-time Human Motion Reconstruction from Sparse IMUs

Real-time human motion reconstruction from a sparse set of wearable IMUs...

AVM-SLAM: Semantic Visual SLAM with Multi-Sensor Fusion in a Bird's Eye View for Automated Valet Parking

Automated Valet Parking (AVP) requires precise localization in challengi...

Please sign up or login with your details

Forgot password? Click here to reset