MRL: Learning to Mix with Attention and Convolutions

08/30/2022
by   Shlok Mohta, et al.
0

In this paper, we present a new neural architectural block for the vision domain, named Mixing Regionally and Locally (MRL), developed with the aim of effectively and efficiently mixing the provided input features. We bifurcate the input feature mixing task as mixing at a regional and local scale. To achieve an efficient mix, we exploit the domain-wide receptive field provided by self-attention for regional-scale mixing and convolutional kernels restricted to local scale for local-scale mixing. More specifically, our proposed method mixes regional features associated with local features within a defined region, followed by a local-scale features mix augmented by regional features. Experiments show that this hybridization of self-attention and convolution brings improved capacity, generalization (right inductive bias), and efficiency. Under similar network settings, MRL outperforms or is at par with its counterparts in classification, object detection, and segmentation tasks. We also show that our MRL-based network architecture achieves state-of-the-art performance for H E histology datasets. We achieved DICE of 0.843, 0.855, and 0.892 for Kumar, CoNSep, and CPM-17 datasets, respectively, while highlighting the versatility offered by the MRL framework by incorporating layers like group convolutions to improve dataset-specific generalization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2021

RegionViT: Regional-to-Local Attention for Vision Transformers

Vision transformer (ViT) has recently showed its strong capability in ac...
research
03/01/2023

Efficient and Explicit Modelling of Image Hierarchies for Image Restoration

The aim of this paper is to propose a mechanism to efficiently and expli...
research
06/14/2021

S^2-MLP: Spatial-Shift MLP Architecture for Vision

Recently, visual Transformer (ViT) and its following works abandon the c...
research
11/29/2020

Deeper or Wider Networks of Point Clouds with Self-attention?

Prevalence of deeper networks driven by self-attention is in stark contr...
research
03/23/2021

Scaling Local Self-Attention For Parameter Efficient Visual Backbones

Self-attention has the promise of improving computer vision systems due ...
research
03/07/2022

WaveMix: Resource-efficient Token Mixing for Images

Although certain vision transformer (ViT) and CNN architectures generali...
research
08/23/2021

Discovering Spatial Relationships by Transformers for Domain Generalization

Due to the rapid increase in the diversity of image data, the problem of...

Please sign up or login with your details

Forgot password? Click here to reset