An Impartial Take to the CNN vs Transformer Robustness Contest

07/22/2022
by   Francesco Pinto, et al.
0

Following the surge of popularity of Transformers in Computer Vision, several studies have attempted to determine whether they could be more robust to distribution shifts and provide better uncertainty estimates than Convolutional Neural Networks (CNNs). The almost unanimous conclusion is that they are, and it is often conjectured more or less explicitly that the reason of this supposed superiority is to be attributed to the self-attention mechanism. In this paper we perform extensive empirical analyses showing that recent state-of-the-art CNNs (particularly, ConvNeXt) can be as robust and reliable or even sometimes more than the current state-of-the-art Transformers. However, there is no clear winner. Therefore, although it is tempting to state the definitive superiority of one family of architectures over another, they seem to enjoy similar extraordinary performances on a variety of tasks while also suffering from similar vulnerabilities such as texture, background, and simplicity biases.

READ FULL TEXT

page 7

page 23

research
01/21/2022

A Comprehensive Study of Vision Transformers on Dense Prediction Tasks

Convolutional Neural Networks (CNNs), architectures consisting of convol...
research
11/02/2021

Can Vision Transformers Perform Convolution?

Several recent studies have demonstrated that attention-based networks, ...
research
06/24/2021

Exploring Corruption Robustness: Inductive Biases in Vision Transformers and MLP-Mixers

Recently, vision transformers and MLP-based models have been developed i...
research
05/17/2021

Vision Transformers are Robust Learners

Transformers, composed of multiple self-attention layers, hold strong pr...
research
01/25/2023

Out of Distribution Performance of State of Art Vision Model

The vision transformer (ViT) has advanced to the cutting edge in the vis...
research
02/19/2023

MedViT: A Robust Vision Transformer for Generalized Medical Image Classification

Convolutional Neural Networks (CNNs) have advanced existing medical syst...
research
07/27/2022

Convolutional Embedding Makes Hierarchical Vision Transformer Stronger

Vision Transformers (ViTs) have recently dominated a range of computer v...

Please sign up or login with your details

Forgot password? Click here to reset