TUNeS: A Temporal U-Net with Self-Attention for Video-based Surgical Phase Recognition

by   Isabel Funke, et al.

To enable context-aware computer assistance in the operating room of the future, cognitive systems need to understand automatically which surgical phase is being performed by the medical team. The primary source of information for surgical phase recognition is typically video, which presents two challenges: extracting meaningful features from the video stream and effectively modeling temporal information in the sequence of visual features. For temporal modeling, attention mechanisms have gained popularity due to their ability to capture long-range dependencies. In this paper, we explore design choices for attention in existing temporal models for surgical phase recognition and propose a novel approach that does not resort to local attention or regularization of attention weights: TUNeS is an efficient and simple temporal model that incorporates self-attention at the coarsest stage of a U-Net-like structure. In addition, we propose to train the feature extractor, a standard CNN, together with an LSTM on preferably long video segments, i.e., with long temporal context. In our experiments, all temporal models performed better on top of feature extractors that were trained with longer temporal context. On top of these contextualized features, TUNeS achieves state-of-the-art results on Cholec80.


page 1

page 3

page 4

page 5


SF-TMN: SlowFast Temporal Modeling Network for Surgical Phase Recognition

Automatic surgical phase recognition is one of the key technologies to s...

Surgical Phase Recognition of Short Video Shots Based on Temporal Modeling of Deep Features

Recognizing the phases of a laparoscopic surgery (LS) operation form its...

Automatic Depression Detection via Learning and Fusing Features from Visual Cues

Depression is one of the most prevalent mental disorders, which seriousl...

Aggregating Long-Term Context for Learning Surgical Workflows

Analyzing surgical workflow is crucial for computers to understand surge...

Metrics Matter in Surgical Phase Recognition

Surgical phase recognition is a basic component for different context-aw...

CataNet: Predicting remaining cataract surgery duration

Cataract surgery is a sight saving surgery that is performed over 10 mil...

Identification of Cognitive Workload during Surgical Tasks with Multimodal Deep Learning

The operating room (OR) is a dynamic and complex environment consisting ...

Please sign up or login with your details

Forgot password? Click here to reset