Whether and When does Endoscopy Domain Pretraining Make Sense?

03/30/2023
by   Dominik Batić, et al.
0

Automated endoscopy video analysis is a challenging task in medical computer vision, with the primary objective of assisting surgeons during procedures. The difficulty arises from the complexity of surgical scenes and the lack of a sufficient amount of annotated data. In recent years, large-scale pretraining has shown great success in natural language processing and computer vision communities. These approaches reduce the need for annotated data, which is always a concern in the medical domain. However, most works on endoscopic video understanding use models pretrained on natural images, creating a domain gap between pretraining and finetuning. In this work, we investigate the need for endoscopy domain-specific pretraining based on downstream objectives. To this end, we first collect Endo700k, the largest publicly available corpus of endoscopic images, extracted from nine public Minimally Invasive Surgery (MIS) datasets. Endo700k comprises more than 700,000 unannotated raw images. Next, we introduce EndoViT, an endoscopy pretrained Vision Transformer (ViT). Through ablations, we demonstrate that domain-specific pretraining is particularly beneficial for more complex downstream tasks, such as Action Triplet Detection, and less effective and even unnecessary for simpler tasks, such as Surgical Phase Recognition. We will release both our code and pretrained models upon acceptance to facilitate further research in this direction.

READ FULL TEXT

page 4

page 5

page 7

page 8

page 11

page 12

research
07/05/2022

Vision-and-Language Pretraining

With the burgeoning amount of data of image-text pairs and diversity of ...
research
10/02/2020

Cost-effective Selection of Pretraining Data: A Case Study of Pretraining BERT on Social Media

Recent studies on domain-specific BERT models show that effectiveness on...
research
06/14/2022

Task Transfer and Domain Adaptation for Zero-Shot Question Answering

Pretrained language models have shown success in various areas of natura...
research
04/05/2020

Improved Pretraining for Domain-specific Contextual Embedding Models

We investigate methods to mitigate catastrophic forgetting during domain...
research
04/19/2023

NetGPT: Generative Pretrained Transformer for Network Traffic

All data on the Internet are transferred by network traffic, thus accura...
research
11/03/2022

Could Giant Pretrained Image Models Extract Universal Representations?

Frozen pretrained models have become a viable alternative to the pretrai...
research
07/18/2023

UniTabE: Pretraining a Unified Tabular Encoder for Heterogeneous Tabular Data

Recent advancements in Natural Language Processing (NLP) have witnessed ...

Please sign up or login with your details

Forgot password? Click here to reset