U-Netmer: U-Net meets Transformer for medical image segmentation

by   Sheng He, et al.

The combination of the U-Net based deep learning models and Transformer is a new trend for medical image segmentation. U-Net can extract the detailed local semantic and texture information and Transformer can learn the long-rang dependencies among pixels in the input image. However, directly adapting the Transformer for segmentation has “token-flatten" problem (flattens the local patches into 1D tokens which losses the interaction among pixels within local patches) and “scale-sensitivity" problem (uses a fixed scale to split the input image into local patches). Compared to directly combining U-Net and Transformer, we propose a new global-local fashion combination of U-Net and Transformer, named U-Netmer, to solve the two problems. The proposed U-Netmer splits an input image into local patches. The global-context information among local patches is learnt by the self-attention mechanism in Transformer and U-Net segments each local patch instead of flattening into tokens to solve the `token-flatten" problem. The U-Netmer can segment the input image with different patch sizes with the identical structure and the same parameter. Thus, the U-Netmer can be trained with different patch sizes to solve the “scale-sensitivity" problem. We conduct extensive experiments in 7 public datasets on 7 organs (brain, heart, breast, lung, polyp, pancreas and prostate) and 4 imaging modalities (MRI, CT, ultrasound, and endoscopy) to show that the proposed U-Netmer can be generally applied to improve accuracy of medical image segmentation. These experimental results show that U-Netmer provides state-of-the-art performance compared to baselines and other models. In addition, the discrepancy among the outputs of U-Netmer with different scales is linearly correlated to the segmentation accuracy which can be considered as a confidence score to rank test images by difficulty without ground-truth.


page 1

page 3

page 4

page 5

page 6

page 8

page 9


Dilated-UNet: A Fast and Accurate Medical Image Segmentation Approach using a Dilated Transformer and U-Net Architecture

Medical image segmentation is crucial for the development of computer-ai...

DS-TransUNet:Dual Swin Transformer U-Net for Medical Image Segmentation

Automatic medical image segmentation has made great progress benefit fro...

D-Former: A U-shaped Dilated Transformer for 3D Medical Image Segmentation

Computer-aided medical image segmentation has been applied widely in dia...

TransClaw U-Net: Claw U-Net with Transformers for Medical Image Segmentation

In recent years, computer-aided diagnosis has become an increasingly pop...

HST-MRF: Heterogeneous Swin Transformer with Multi-Receptive Field for Medical Image Segmentation

The Transformer has been successfully used in medical image segmentation...

Contrastive Transformer: Contrastive Learning Scheme with Transformer innate Patches

This paper presents Contrastive Transformer, a contrastive learning sche...

Cascaded Cross-Attention Networks for Data-Efficient Whole-Slide Image Classification Using Transformers

Whole-Slide Imaging allows for the capturing and digitization of high-re...

Please sign up or login with your details

Forgot password? Click here to reset