Surgical Phase Recognition of Short Video Shots Based on Temporal Modeling of Deep Features

by   Constantinos Loukas, et al.

Recognizing the phases of a laparoscopic surgery (LS) operation form its video constitutes a fundamental step for efficient content representation, indexing and retrieval in surgical video databases. In the literature, most techniques focus on phase segmentation of the entire LS video using hand-crafted visual features, instrument usage signals, and recently convolutional neural networks (CNNs). In this paper we address the problem of phase recognition of short video shots (10s) of the operation, without utilizing information about the preceding/forthcoming video frames, their phase labels or the instruments used. We investigate four state-of-the-art CNN architectures (Alexnet, VGG19, GoogleNet, and ResNet101), for feature extraction via transfer learning. Visual saliency was employed for selecting the most informative region of the image as input to the CNN. Video shot representation was based on two temporal pooling mechanisms. Most importantly, we investigate the role of 'elapsed time' (from the beginning of the operation), and we show that inclusion of this feature can increase performance dramatically (69 (LSTM) network was trained for video shot classification based on the fusion of CNN features with 'elapsed time', increasing the accuracy to 86 highlight the prominent role of visual saliency, long-range temporal recursion and 'elapsed time' (a feature so far ignored), for surgical phase recognition.


page 2

page 3

page 4


TUNeS: A Temporal U-Net with Self-Attention for Video-based Surgical Phase Recognition

To enable context-aware computer assistance in the operating room of the...

SF-TMN: SlowFast Temporal Modeling Network for Surgical Phase Recognition

Automatic surgical phase recognition is one of the key technologies to s...

EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos

Surgical workflow recognition has numerous potential medical application...

OperA: Attention-Regularized Transformers for Surgical Phase Recognition

In this paper we introduce OperA, a transformer-based model that accurat...

Efficient Global-Local Memory for Real-time Instrument Segmentation of Robotic Surgical Video

Performing a real-time and accurate instrument segmentation from videos ...

Multi-Task Recurrent Convolutional Network with Correlation Loss for Surgical Video Analysis

Surgical tool presence detection and surgical phase recognition are two ...

On the Pitfalls of Batch Normalization for End-to-End Video Learning: A Study on Surgical Workflow Analysis

Batch Normalization's (BN) unique property of depending on other samples...

Please sign up or login with your details

Forgot password? Click here to reset