Track to Reconstruct and Reconstruct to Track
Object tracking and reconstruction are often performed together, with tracking used as input for 3D reconstrution. However, the obtained 3D reconstructions also provide useful information that can be exploited to improve tracking. In this paper, we propose a novel method that closes this loop, tracking to reconstruct, and then reconstructing to track. Our approach, MOTSFusion (Multi-Object Tracking, Segmentation and dynamic object Fusion), exploits the 3D motion extracted from dynamic object reconstructions to track objects through long periods of complete occlusion and to recover missing detections. For this, we build up short tracklets using the 2D motion consistency of segmentation masks under optical flow warping. These tracklets are then fused into dynamic 3D object reconstructions which define the precise 3D object motion. This 3D motion is used to merge tracklets into long-term tracks, even when objects are completely occluded for up to 20 frames, and to locate objects when detections are missing. On the KITTI dataset, our reconstruction-based tracking reduces the number of ID switches of the initial tracklets by more than 50 ability results in MOTSFusion outperforming previous approaches in both bounding box and segmentation mask tracking accuracy.
READ FULL TEXT