Vision Transformers for Mobile Applications: A Short Survey

05/30/2023
by   Nahid Alam, et al.
0

Vision Transformers (ViTs) have demonstrated state-of-the-art performance on many Computer Vision Tasks. Unfortunately, deploying these large-scale ViTs is resource-consuming and impossible for many mobile devices. While most in the community are building for larger and larger ViTs, we ask a completely opposite question: How small can a ViT be within the tradeoffs of accuracy and inference latency that make it suitable for mobile deployment? We look into a few ViTs specifically designed for mobile applications and observe that they modify the transformer's architecture or are built around the combination of CNN and transformer. Recent work has also attempted to create sparse ViT networks and proposed alternatives to the attention module. In this paper, we study these architectures, identify the challenges and analyze what really makes a vision transformer suitable for mobile applications. We aim to serve as a baseline for future research direction and hopefully lay the foundation to choose the exemplary vision transformer architecture for your application running on mobile devices.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2022

MoCoViT: Mobile Convolutional Vision Transformer

Recently, Transformer networks have achieved impressive results on a var...
research
07/22/2022

Applying Spatiotemporal Attention to Identify Distracted and Drowsy Driving with Vision Transformers

A 20 result of increased distraction and drowsiness. Drowsy and distract...
research
06/02/2022

EfficientFormer: Vision Transformers at MobileNet Speed

Vision Transformers (ViT) have shown rapid progress in computer vision t...
research
07/01/2023

MobileViG: Graph-Based Sparse Attention for Mobile Vision Applications

Traditionally, convolutional neural networks (CNN) and vision transforme...
research
06/20/2023

Exploring the Performance and Efficiency of Transformer Models for NLP on Mobile Devices

Deep learning (DL) is characterised by its dynamic nature, with new deep...
research
12/15/2022

Rethinking Vision Transformers for MobileNet Size and Speed

With the success of Vision Transformers (ViTs) in computer vision tasks,...
research
09/07/2022

Visual Transformer for Soil Classification

Our food security is built on the foundation of soil. Farmers would be u...

Please sign up or login with your details

Forgot password? Click here to reset