Choose a Transformer: Fourier or Galerkin

05/31/2021
by   Shuhao Cao, et al.
0

In this paper, we apply the self-attention from the state-of-the-art Transformer in Attention Is All You Need the first time to a data-driven operator learning problem related to partial differential equations. We put together an effort to explain the heuristics of, and improve the efficacy of the self-attention by demonstrating that the softmax normalization in the scaled dot-product attention is sufficient but not necessary, and have proved the approximation capacity of a linear variant as a Petrov-Galerkin projection. A new layer normalization scheme is proposed to allow a scaling to propagate through attention layers, which helps the model achieve remarkable accuracy in operator learning tasks with unnormalized data. Finally, we present three operator learning experiments, including the viscid Burgers' equation, an interface Darcy flow, and an inverse interface coefficient identification problem. All experiments validate the improvements of the newly proposed simple attention-based operator learner over their softmax-normalized counterparts.

READ FULL TEXT

page 20

page 22

page 24

research
10/22/2021

SOFT: Softmax-free Transformer with Linear Complexity

Vision transformers (ViTs) have pushed the state-of-the-art for various ...
research
05/26/2022

Transformer for Partial Differential Equations' Operator Learning

Data-driven learning of partial differential equations' solution operato...
research
05/27/2023

Scalable Transformer for PDE Surrogate Modeling

Transformer has shown state-of-the-art performance on various applicatio...
research
10/22/2021

Sinkformers: Transformers with Doubly Stochastic Attention

Attention based models such as Transformers involve pairwise interaction...
research
03/23/2021

Scaling Local Self-Attention For Parameter Efficient Visual Backbones

Self-attention has the promise of improving computer vision systems due ...
research
03/08/2021

Lipschitz Normalization for Self-Attention Layers with Application to Graph Neural Networks

Attention based neural networks are state of the art in a large range of...
research
07/27/2023

Scaling TransNormer to 175 Billion Parameters

We present TransNormerLLM, the first linear attention-based Large Langua...

Please sign up or login with your details

Forgot password? Click here to reset