HybridMIM: A Hybrid Masked Image Modeling Framework for 3D Medical Image Segmentation

by   Zhaohu Xing, et al.

Masked image modeling (MIM) with transformer backbones has recently been exploited as a powerful self-supervised pre-training technique. The existing MIM methods adopt the strategy to mask random patches of the image and reconstruct the missing pixels, which only considers semantic information at a lower level, and causes a long pre-training time.This paper presents HybridMIM, a novel hybrid self-supervised learning method based on masked image modeling for 3D medical image segmentation.Specifically, we design a two-level masking hierarchy to specify which and how patches in sub-volumes are masked, effectively providing the constraints of higher level semantic information. Then we learn the semantic information of medical images at three levels, including:1) partial region prediction to reconstruct key contents of the 3D image, which largely reduces the pre-training time burden (pixel-level); 2) patch-masking perception to learn the spatial relationship between the patches in each sub-volume (region-level).and 3) drop-out-based contrastive learning between samples within a mini-batch, which further improves the generalization ability of the framework (sample-level). The proposed framework is versatile to support both CNN and transformer as encoder backbones, and also enables to pre-train decoders for image segmentation. We conduct comprehensive experiments on four widely-used public medical image segmentation datasets, including BraTS2020, BTCV, MSD Liver, and MSD Spleen. The experimental results show the clear superiority of HybridMIM against competing supervised methods, masked pre-training approaches, and other self-supervised methods, in terms of quantitative metrics, timing performance and qualitative observations. The codes of HybridMIM are available at https://github.com/ge-xing/HybridMIM


page 1

page 3

page 7


Multi-level Asymmetric Contrastive Learning for Medical Image Segmentation Pre-training

Contrastive learning, which is a powerful technique for learning image-l...

MPS-AMS: Masked Patches Selection and Adaptive Masking Strategy Based Self-Supervised Medical Image Segmentation

Existing self-supervised learning methods based on contrastive learning ...

Good helper is around you: Attention-driven Masked Image Modeling

It has been witnessed that masked image modeling (MIM) has shown a huge ...

MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained on a Large-Scale Unannotated Dataset

Pretraining with large-scale 3D volumes has a potential for improving th...

Geometric Visual Similarity Learning in 3D Medical Image Self-supervised Pre-training

Learning inter-image similarity is crucial for 3D medical images self-su...

Self-supervised Semantic Segmentation: Consistency over Transformation

Accurate medical image segmentation is of utmost importance for enabling...

The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training

The self-supervised Masked Image Modeling (MIM) schema, following "mask-...

Please sign up or login with your details

Forgot password? Click here to reset