A Mathematical Theory of Attention

07/06/2020
by   James Vuckovic, et al.
0

Attention is a powerful component of modern neural networks across a wide variety of domains. However, despite its ubiquity in machine learning, there is a gap in our understanding of attention from a theoretical point of view. We propose a framework to fill this gap by building a mathematically equivalent model of attention using measure theory. With this model, we are able to interpret self-attention as a system of self-interacting particles, we shed light on self-attention from a maximum entropy perspective, and we show that attention is actually Lipschitz-continuous (with an appropriate metric) under suitable assumptions. We then apply these insights to the problem of mis-specified input data; infinitely-deep, weight-sharing self-attention networks; and more general Lipschitz estimates for a specific type of attention studied in concurrent work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2020

The Lipschitz Constant of Self-Attention

Lipschitz constants of neural networks have been explored in various con...
research
02/10/2021

On the Regularity of Attention

Attention is a powerful component of modern neural networks across a wid...
research
04/28/2020

Exploring Self-attention for Image Recognition

Recent work has shown that self-attention can serve as a basic building ...
research
06/16/2019

Theoretical Limitations of Self-Attention in Neural Sequence Models

Transformers are emerging as the new workhorse of NLP, showing great suc...
research
03/08/2021

Lipschitz Normalization for Self-Attention Layers with Application to Graph Neural Networks

Attention based neural networks are state of the art in a large range of...
research
08/23/2023

FOSA: Full Information Maximum Likelihood (FIML) Optimized Self-Attention Imputation for Missing Data

In data imputation, effectively addressing missing values is pivotal, es...
research
05/31/2019

Constructive Type-Logical Supertagging with Self-Attention Networks

We propose a novel application of self-attention networks towards gramma...

Please sign up or login with your details

Forgot password? Click here to reset