Multi-scale Efficient Graph-Transformer for Whole Slide Image Classification

by   Saisai Ding, et al.

The multi-scale information among the whole slide images (WSIs) is essential for cancer diagnosis. Although the existing multi-scale vision Transformer has shown its effectiveness for learning multi-scale image representation, it still cannot work well on the gigapixel WSIs due to their extremely large image sizes. To this end, we propose a novel Multi-scale Efficient Graph-Transformer (MEGT) framework for WSI classification. The key idea of MEGT is to adopt two independent Efficient Graph-based Transformer (EGT) branches to process the low-resolution and high-resolution patch embeddings (i.e., tokens in a Transformer) of WSIs, respectively, and then fuse these tokens via a multi-scale feature fusion module (MFFM). Specifically, we design an EGT to efficiently learn the local-global information of patch tokens, which integrates the graph representation into Transformer to capture spatial-related information of WSIs. Meanwhile, we propose a novel MFFM to alleviate the semantic gap among different resolution patches during feature fusion, which creates a non-patch token for each branch as an agent to exchange information with another branch by cross-attention. In addition, to expedite network training, a novel token pruning module is developed in EGT to reduce the redundant tokens. Extensive experiments on TCGA-RCC and CAMELYON16 datasets demonstrate the effectiveness of the proposed MEGT.


page 1

page 3

page 8


CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

The recently developed vision transformer (ViT) has achieved promising r...

Multi-Scale Prototypical Transformer for Whole Slide Image Classification

Whole slide image (WSI) classification is an essential task in computati...

PPT Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion

The Transformer architecture has achieved rapiddevelopment in recent yea...

Few-Shot Learning Meets Transformer: Unified Query-Support Transformers for Few-Shot Classification

Few-shot classification which aims to recognize unseen classes using ver...

Trusted Multi-Scale Classification Framework for Whole Slide Image

Despite remarkable efforts been made, the classification of gigapixels w...

Improved Image Classification with Token Fusion

In this paper, we propose a method using the fusion of CNN and transform...

Bilateral-ViT for Robust Fovea Localization

The fovea is an important anatomical landmark of the retina. Detecting t...

Please sign up or login with your details

Forgot password? Click here to reset