A classical problem in computer vision is to infer a 3D scene representa...
Vision Transformers (ViT) have been shown to attain highly competitive
p...
Convolutional Neural Networks (CNNs) are the go-to model for computer vi...
Neural Networks require large amounts of memory and compute to process h...
While the Transformer architecture has become the de-facto standard for
...
In this paper, we offer a preliminary investigation into the task of in-...
In this work, we present an empirical study of generation order for mach...
Due to the statistical complexity of video, the high degree of inherent
...
We present KERMIT, a simple insertion-based approach to generative model...
We present the Insertion Transformer, an iterative, partially autoregres...
Deep autoregressive sequence-to-sequence models have demonstrated impres...
Music relies heavily on self-reference to build structure and meaning. W...
Music relies heavily on repetition to build structure and meaning.
Self-...
Self-attentive feed-forward sequence models have been shown to achieve
i...
Tensor2Tensor is a library for deep learning models that is well-suited ...
Autoregressive sequence models based on deep neural networks, such as RN...
Relying entirely on an attention mechanism, the Transformer introduced b...
Image generation has been successfully cast as an autoregressive sequenc...
Image generation has been successfully cast as an autoregressive sequenc...
Deep learning yields great results across many fields, from speech
recog...
The dominant sequence transduction models are based on complex recurrent...
We present a solution to the problem of paraphrase identification of
que...
We present a framework for question answering that can efficiently scale...
We propose a simple neural architecture for natural language inference. ...