Convolution-enhanced Evolving Attention Networks

12/16/2022
by   Yujing Wang, et al.
0

Attention-based neural networks, such as Transformers, have become ubiquitous in numerous applications, including computer vision, natural language processing, and time-series analysis. In all kinds of attention networks, the attention maps are crucial as they encode semantic dependencies between input tokens. However, most existing attention networks perform modeling or reasoning based on representations, wherein the attention maps of different layers are learned separately without explicit interactions. In this paper, we propose a novel and generic evolving attention mechanism, which directly models the evolution of inter-token relationships through a chain of residual convolutional modules. The major motivations are twofold. On the one hand, the attention maps in different layers share transferable knowledge, thus adding a residual connection can facilitate the information flow of inter-token relationships across layers. On the other hand, there is naturally an evolutionary trend among attention maps at different abstraction levels, so it is beneficial to exploit a dedicated convolution-based module to capture this process. Equipped with the proposed mechanism, the convolution-enhanced evolving attention networks achieve superior performance in various applications, including time-series representation, natural language understanding, machine translation, and image classification. Especially on time-series representation tasks, Evolving Attention-enhanced Dilated Convolutional (EA-DC-) Transformer outperforms state-of-the-art models significantly, achieving an average of 17 SOTA. To the best of our knowledge, this is the first work that explicitly models the layer-wise evolution of attention maps. Our implementation is available at https://github.com/pkuyym/EvolvingAttention

READ FULL TEXT

page 2

page 12

page 18

page 19

page 20

page 21

research
08/19/2022

Expressing Multivariate Time Series as Graphs with Time Series Attention Transformer

A reliable and efficient representation of multivariate time series is c...
research
06/23/2022

Dynamic Scene Deblurring Base on Continuous Cross-Layer Attention Transmission

The deep convolutional neural networks (CNNs) using attention mechanism ...
research
12/25/2019

Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection

Self-attention based Transformer has demonstrated the state-of-the-art p...
research
02/15/2022

Transformers in Time Series: A Survey

Transformers have achieved superior performances in many tasks in natura...
research
07/14/2022

Rethinking Attention Mechanism in Time Series Classification

Attention-based models have been widely used in many areas, such as comp...
research
03/08/2022

Measuring the Mixing of Contextual Information in the Transformer

The Transformer architecture aggregates input information through the se...
research
03/15/2021

UPANets: Learning from the Universal Pixel Attention Networks

Among image classification, skip and densely-connection-based networks h...

Please sign up or login with your details

Forgot password? Click here to reset