Fast-ParC: Position Aware Global Kernel for ConvNets and ViTs

10/08/2022
by   Tao Yang, et al.
0

Transformer models have made tremendous progress in various fields in recent years. In the field of computer vision, vision transformers (ViTs) also become strong alternatives to convolutional neural networks (ConvNets), yet they have not been able to replace ConvNets since both have their own merits. For instance, ViTs are good at extracting global features with attention mechanisms while ConvNets are more efficient in modeling local relationships due to their strong inductive bias. A natural idea that arises is to combine the strengths of both ConvNets and ViTs to design new structures. In this paper, we propose a new basic neural network operator named position-aware circular convolution (ParC) and its accelerated version Fast-ParC. The ParC operator can capture global features by using a global kernel and circular convolution while keeping location sensitiveness by employing position embeddings. Our Fast-ParC further reduces the O(n2) time complexity of ParC to O(n log n) using Fast Fourier Transform. This acceleration makes it possible to use global convolution in the early stages of models with large feature maps, yet still maintains the overall computational cost comparable with using 3x3 or 7x7 kernels. The proposed operation can be used in a plug-and-play manner to 1) convert ViTs to pure-ConvNet architecture to enjoy wider hardware support and achieve higher inference speed; 2) replacing traditional convolutions in the deep stage of ConvNets to improve accuracy by enlarging the effective receptive field. Experiment results show that our ParC op can effectively enlarge the receptive field of traditional ConvNets, and adopting the proposed op benefits both ViTs and ConvNet models on all three popular vision tasks, image classification, object

READ FULL TEXT

page 2

page 14

page 15

page 19

research
11/14/2022

ParCNetV2: Oversized Kernel with Enhanced Attention

Transformers have achieved tremendous success in various computer vision...
research
03/08/2022

EdgeFormer: Improving Light-weight ConvNets by Learning from Vision Transformers

Recently, vision transformers started to show impressive results which o...
research
04/06/2022

FFC-SE: Fast Fourier Convolution for Speech Enhancement

Fast Fourier convolution (FFC) is the recently proposed neural operator ...
research
07/26/2021

Log-Polar Space Convolution for Convolutional Neural Networks

Convolutional neural networks use regular quadrilateral convolution kern...
research
12/06/2022

AbHE: All Attention-based Homography Estimation

Homography estimation is a basic computer vision task, which aims to obt...
research
11/07/2021

Are we ready for a new paradigm shift? A Survey on Visual Deep MLP

Multilayer perceptron (MLP), as the first neural network structure to ap...
research
08/24/2022

SwinFIR: Revisiting the SwinIR with Fast Fourier Convolution and Improved Training for Image Super-Resolution

Transformer-based methods have achieved impressive image restoration per...

Please sign up or login with your details

Forgot password? Click here to reset