Incorporating Temporal Prior from Motion Flow for Instrument Segmentation in Minimally Invasive Surgery Video

by   Yueming Jin, et al.

Automatic instrument segmentation in video is an essentially fundamental yet challenging problem for robot-assisted minimally invasive surgery. In this paper, we propose a novel framework to leverage instrument motion information, by incorporating a derived temporal prior to an attention pyramid network for accurate segmentation. Our inferred prior can provide reliable indication of the instrument location and shape, which is propagated from the previous frame to the current frame according to inter-frame motion flow. This prior is injected to the middle of an encoder-decoder segmentation network as an initialization of a pyramid of attention modules, to explicitly guide segmentation output from coarse to fine. In this way, the temporal dynamics and the attention network can effectively complement and benefit each other. As additional usage, our temporal prior enables semi-supervised learning with periodically unlabeled video frames, simply by reverse execution. We extensively validate our method on the public 2017 MICCAI EndoVis Robotic Instrument Segmentation Challenge dataset with three different tasks. Our method consistently exceeds the state-of-the-art results across all three tasks by a large margin. Our semi-supervised variant also demonstrates a promising potential for reducing annotation cost in the clinical practice.


page 3

page 7


Learning Motion Flows for Semi-supervised Instrument Segmentation from Robotic Surgical Video

Performing low hertz labeling for surgical videos at intervals can great...

Exploiting Temporality for Semi-Supervised Video Segmentation

In recent years, there has been remarkable progress in supervised image ...

Efficient Global-Local Memory for Real-time Instrument Segmentation of Robotic Surgical Video

Performing a real-time and accurate instrument segmentation from videos ...

MATIS: Masked-Attention Transformers for Surgical Instrument Segmentation

We propose Masked-Attention Transformers for Surgical Instrument Segment...

C2F-TCN: A Framework for Semi and Fully Supervised Temporal Action Segmentation

Temporal action segmentation tags action labels for every frame in an in...

Medical Instrument Segmentation in 3D US by Hybrid Constrained Semi-Supervised Learning

Medical instrument segmentation in 3D ultrasound is essential for image-...

Paying More Attention to Motion: Attention Distillation for Learning Video Representations

We address the challenging problem of learning motion representations us...

Please sign up or login with your details

Forgot password? Click here to reset