Multimodal Self-Supervised Learning for Medical Image Analysis

by   Aiham Taleb, et al.

In this paper, we propose a self-supervised learning approach that leverages multiple imaging modalities to increase data efficiency for medical image analysis. To this end, we introduce multimodal puzzle-solving proxy tasks, which facilitate neural network representation learning from multiple image modalities. These representations allow for subsequent fine-tuning on different downstream tasks. To achieve that, we employ the Sinkhorn operator to predict permutations of puzzle pieces in conjunction with a modality agnostic feature embedding. Together, they allow for a lean network architecture and increased computational efficiency. Under this framework, we propose different strategies for puzzle construction, integrating multiple medical imaging modalities, with varying levels of puzzle complexity. We benchmark these strategies in a range of experiments to assess the gains of our method in downstream performance and data-efficiency on different target tasks. Our experiments show that solving puzzles interleaved with multimodal content yields more powerful semantic representations. This allows us to solve downstream tasks more accurately and efficiently, compared to treating each modality independently. We demonstrate the effectiveness of the proposed approach on two multimodal medical imaging benchmarks: the BraTS and the Prostate semantic segmentation datasets, on which we achieve competitive results to state-of-the-art solutions, at a fraction of the computational expense. We also outperform many previous solutions on the chosen benchmarks.


page 1

page 5

page 7

page 12

page 13


3D Self-Supervised Methods for Medical Imaging

Self-supervised learning methods have witnessed a recent surge of intere...

Self-Supervised Learning for 3D Medical Image Analysis using 3D SimCLR and Monte Carlo Dropout

Self-supervised learning methods can be used to learn meaningful represe...

Stain-Adaptive Self-Supervised Learning for Histopathology Image Analysis

It is commonly recognized that color variations caused by differences in...

Self-supervised Representation Learning for Ultrasound Video

Recent advances in deep learning have achieved promising performance for...

A unified representation network for segmentation with missing modalities

Over the last few years machine learning has demonstrated groundbreaking...

Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity

We present a comprehensive evaluation of Parameter-Efficient Fine-Tuning...

Artificial-Spiking Hierarchical Networks for Vision-Language Representation Learning

With the success of self-supervised learning, multimodal foundation mode...

Please sign up or login with your details

Forgot password? Click here to reset