Centered Self-Attention Layers

06/02/2023
by   Ameen Ali, et al.
4

The self-attention mechanism in transformers and the message-passing mechanism in graph neural networks are repeatedly applied within deep learning architectures. We show that this application inevitably leads to oversmoothing, i.e., to similar representations at the deeper layers for different tokens in transformers and different nodes in graph neural networks. Based on our analysis, we present a correction term to the aggregating operator of these mechanisms. Empirically, this simple term eliminates much of the oversmoothing problem in visual transformers, obtaining performance in weakly supervised segmentation that surpasses elaborate baseline methods that introduce multiple auxiliary networks and training phrases. In graph neural networks, the correction term enables the training of very deep architectures more effectively than many recent solutions to the same problem.

READ FULL TEXT

page 5

page 9

research
03/01/2023

Are More Layers Beneficial to Graph Transformers?

Despite that going deep has proven successful in many neural architectur...
research
03/08/2021

Lipschitz Normalization for Self-Attention Layers with Application to Graph Neural Networks

Attention based neural networks are state of the art in a large range of...
research
02/16/2020

Robustness Verification for Transformers

Robustness verification that aims to formally certify the prediction beh...
research
04/06/2023

Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention

Humans possess a versatile mechanism for extracting structured represent...
research
03/14/2022

Simplicial Attention Neural Networks

The aim of this work is to introduce simplicial attention networks (SANs...
research
10/21/2021

FDGATII : Fast Dynamic Graph Attention with Initial Residual and Identity Mapping

While Graph Neural Networks have gained popularity in multiple domains, ...
research
03/01/2021

Coordination Among Neural Modules Through a Shared Global Workspace

Deep learning has seen a movement away from representing examples with a...

Please sign up or login with your details

Forgot password? Click here to reset