Relative Positional Encoding for Transformers with Linear Complexity

05/18/2021
by   Antoine Liutkus, et al.
0

Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity. In the meantime, relative positional encoding (RPE) was proposed as beneficial for classical Transformers and consists in exploiting lags instead of absolute positions for inference. Still, RPE is not available for the recent linear-variants of the Transformer, because it requires the explicit computation of the attention matrix, which is precisely what is avoided by such methods. In this paper, we bridge this gap and present Stochastic Positional Encoding as a way to generate PE that can be used as a replacement to the classical additive (sinusoidal) PE and provably behaves like RPE. The main theoretical contribution is to make a connection between positional encoding and cross-covariance structures of correlated Gaussian processes. We illustrate the performance of our approach on the Long-Range Arena benchmark and on music generation.

READ FULL TEXT

page 1

page 7

page 8

page 18

page 19

page 20

page 22

page 23

research
07/18/2023

Linearized Relative Positional Encoding

Relative positional encoding is widely used in vanilla and linear transf...
research
06/23/2021

Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

The attention module, which is a crucial component in Transformer, canno...
research
10/02/2020

Beyond Chemical 1D knowledge using Transformers

In the present paper we evaluated efficiency of the recent Transformer-C...
research
10/19/2022

The Devil in Linear Transformer

Linear transformers aim to reduce the quadratic space-time complexity of...
research
11/08/2020

Long Range Arena: A Benchmark for Efficient Transformers

Transformers do not scale very well to long sequence lengths largely bec...
research
05/26/2022

Your Transformer May Not be as Powerful as You Expect

Relative Positional Encoding (RPE), which encodes the relative distance ...
research
04/18/2021

Demystifying the Better Performance of Position Encoding Variants for Transformer

Transformers are state of the art models in NLP that map a given input s...

Please sign up or login with your details

Forgot password? Click here to reset