We demonstrate that Contrastive Decoding – a simple, computationally lig...
We present a scalable method to build a high quality instruction followi...
Language models (LMs) often struggle to pay enough attention to the inpu...
Evaluating the factuality of long-form text generated by large language
...
Large language models are trained in two stages: (1) unsupervised pretra...
Autoregressive transformers are spectacular models for short sequences b...
Prompt tuning is one of the successful approaches for parameter-efficien...
Large language models are typically trained densely: all parameters are
...
We introduce REPLUG, a retrieval-augmented language modeling framework t...
We introduce Progressive Prompts - a simple and efficient approach for
c...
Despite many recent advancements in language modeling, state-of-the-art
...
Large-scale generative models show an impressive ability to perform a wi...
Existing language models (LMs) predict tokens with a softmax over a fini...
Sampling diverse programs from a code language model and reranking with ...
Existing approaches built separate classifiers to detect nonsense in
dia...
Recent multimodal models such as DALL-E and CM3 have achieved remarkable...
Likelihood, although useful as a training loss, is a poor search objecti...
We investigate the ability of language models to perform compositional
r...
Large language models have been widely adopted but require significant G...
We present Branch-Train-Merge (BTM), a communication-efficient algorithm...
We introduce ART, a new corpus-level autoencoding approach for training ...
State-of-the-art encoder-decoder models (e.g. for machine translation (M...
Creating labeled natural language training data is expensive and require...
We propose a simple and effective re-ranking method for improving passag...
Code is seldom written in a single left-to-right pass and is instead
rep...
Large language models (LMs) are able to in-context learn – perform a new...
We introduce CM3, a family of causally masked generative models trained ...
We introduce MetaICL (Meta-training for In-Context Learning), a new
meta...
Distilling state-of-the-art transformer models into lightweight student
...
Multi-task learning with an unbalanced data distribution skews model lea...
Stateful optimizers maintain gradient statistics over time, e.g., the
ex...
Since the introduction of the transformer model by Vaswani et al. (2017)...
We introduce a new domain expert mixture (DEMix) layer that enables
cond...
We introduce a noisy channel approach for language model prompting in
fe...
We introduce HTLM, a hyper-text language model trained on a large-scale ...
This paper proposes a pre-training objective based on question answering...
When intelligent agents communicate to accomplish shared goals, how do t...
We introduce a new balanced assignment of experts (BASE) layer for large...
We explore the benefits of decreasing the input length of transformers.
...
Structured information is an important knowledge source for automatic
ve...
We introduce k-nearest-neighbor machine translation (kNN-MT), which
pred...
The structured representation for semantic parsing in task-oriented assi...
We propose Grounded Adaptation for Zero-shot Executable Semantic Parsing...
We introduce MARGE, a pre-trained sequence-to-sequence model learned wit...
Large pre-trained language models have been shown to store factual knowl...
Practical applications of abstractive summarization models are limited b...
This paper demonstrates that multilingual denoising pre-training produce...
Inspired by modular software design principles of independence,
intercha...
We introduce kNN-LMs, which extend a pre-trained neural language model (...
We present BART, a denoising autoencoder for pretraining sequence-to-seq...