DeepAI AI Chat
Log In Sign Up

Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

06/04/2021
by   Jannik Kossen, et al.
19

We challenge a common assumption underlying most supervised deep learning: that a model makes a prediction depending only on its parameters and the features of a single input. To this end, we introduce a general-purpose deep learning architecture that takes as input the entire dataset instead of processing one datapoint at a time. Our approach uses self-attention to reason about relationships between datapoints explicitly, which can be seen as realizing non-parametric models using parametric attention mechanisms. However, unlike conventional non-parametric models, we let the model learn end-to-end from the data how to make use of other datapoints for prediction. Empirically, our models solve cross-datapoint lookup and complex reasoning tasks unsolvable by traditional deep learning models. We show highly competitive results on tabular data, early results on CIFAR-10, and give insight into how the model makes use of the interactions between points.

READ FULL TEXT

page 10

page 11

page 13

page 14

page 16

page 23

page 29

page 37

01/25/2018

Data-Driven Impulse Response Regularization via Deep Learning

We consider the problem of impulse response estimation for stable linear...
10/12/2021

Relative Molecule Self-Attention Transformer

Self-supervised learning holds promise to revolutionize molecule propert...
06/08/2021

Staircase Attention for Recurrent Processing of Sequences

Attention mechanisms have become a standard tool for sequence modeling t...
12/08/2021

A Simple and efficient deep Scanpath Prediction

Visual scanpath is the sequence of fixation points that the human gaze t...
12/09/2022

Mitigation of Spatial Nonstationarity with Vision Transformers

Spatial nonstationarity, the location variance of features' statistical ...
06/21/2017

NPGLM: A Non-Parametric Method for Temporal Link Prediction

In this paper, we try to solve the problem of temporal link prediction i...
07/10/2020

Deep Contextual Clinical Prediction with Reverse Distillation

Healthcare providers are increasingly using learned methods to predict a...

Code Repositories

non-parametric-transformers

Code for "Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning"


view repo