An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

10/22/2020
by   Alexey Dosovitskiy, et al.
6

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

READ FULL TEXT

page 8

page 17

page 21

research
03/02/2022

Aggregated Pyramid Vision Transformer: Split-transform-merge Strategy for Image Recognition without Convolutions

With the achievements of Transformer in the field of natural language pr...
research
05/04/2021

MLP-Mixer: An all-MLP Architecture for Vision

Convolutional Neural Networks (CNNs) are the go-to model for computer vi...
research
01/25/2022

Convolutional Xformers for Vision

Vision transformers (ViTs) have found only limited practical use in proc...
research
06/10/2021

Scaling Vision with Sparse Mixture of Experts

Sparsely-gated Mixture of Experts networks (MoEs) have demonstrated exce...
research
12/20/2013

Fast Training of Convolutional Networks through FFTs

Convolutional networks are one of the most widely employed architectures...
research
05/28/2023

Key-Value Transformer

Transformers have emerged as the prevailing standard solution for variou...
research
03/09/2022

CP-ViT: Cascade Vision Transformer Pruning via Progressive Sparsity Prediction

Vision transformer (ViT) has achieved competitive accuracy on a variety ...

Please sign up or login with your details

Forgot password? Click here to reset