Multi Resolution Analysis (MRA) for Approximate Self-Attention

07/21/2022
by   Zhanpeng Zeng, et al.
7

Transformers have emerged as a preferred model for many tasks in natural langugage processing and vision. Recent efforts on training and deploying Transformers more efficiently have identified many strategies to approximate the self-attention matrix, a key module in a Transformer architecture. Effective ideas include various prespecified sparsity patterns, low-rank basis expansions and combinations thereof. In this paper, we revisit classical Multiresolution Analysis (MRA) concepts such as Wavelets, whose potential value in this setting remains underexplored thus far. We show that simple approximations based on empirical feedback and design choices informed by modern hardware and implementation challenges, eventually yield a MRA-based approach for self-attention with an excellent performance profile across most criteria of interest. We undertake an extensive set of experiments and demonstrate that this multi-resolution scheme outperforms most efficient self-attention proposals and is favorable for both short and long sequences. Code is available at <https://github.com/mlpen/mra-attention>.

READ FULL TEXT

page 1

page 12

research
04/26/2022

Understanding The Robustness in Vision Transformers

Recent studies show that Vision Transformers(ViTs) exhibit strong robust...
research
02/07/2021

Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention

Transformers have emerged as a powerful tool for a broad range of natura...
research
01/03/2022

Vision Transformer with Deformable Attention

Transformers have recently shown superior performances on various vision...
research
04/23/2022

Visual Attention Emerges from Recurrent Sparse Reconstruction

Visual attention helps achieve robust perception under noise, corruption...
research
07/01/2021

Global Filter Networks for Image Classification

Recent advances in self-attention and pure multi-layer perceptrons (MLP)...
research
05/22/2023

VanillaNet: the Power of Minimalism in Deep Learning

At the heart of foundation models is the philosophy of "more is differen...
research
03/02/2022

DCT-Former: Efficient Self-Attention with Discrete Cosine Transform

Since their introduction the Trasformer architectures emerged as the dom...

Please sign up or login with your details

Forgot password? Click here to reset