Neural Networks require large amounts of memory and compute to process h...
Attention-based architectures have become ubiquitous in machine learning...
We provide a general self-attention formulation to impose group equivari...
Attention layers are widely used in natural language processing (NLP) an...
Recent advances in cross-lingual word embeddings have primarily relied o...
Recent trends of incorporating attention mechanisms in vision have led
r...
We consider the problem of path inference: given a path prefix, i.e., a
...
Huge scale machine learning problems are nowadays tackled by distributed...