b'Mike Lewis'

research

∙ 09/17/2023

Contrastive Decoding Improves Reasoning in Large Language Models

We demonstrate that Contrastive Decoding – a simple, computationally lig...

0 Sean O'Brien, et al. ∙

research

∙ 08/11/2023

Self-Alignment with Instruction Backtranslation

We present a scalable method to build a high quality instruction followi...

0 Xian Li, et al. ∙

research

∙ 05/24/2023

Trusting Your Evidence: Hallucinate Less with Context-aware Decoding

Language models (LMs) often struggle to pay enough attention to the inpu...

0 Weijia Shi, et al. ∙

research

∙ 05/23/2023

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Evaluating the factuality of long-form text generated by large language ...

0 Sewon Min, et al. ∙

research

∙ 05/18/2023

LIMA: Less Is More for Alignment

Large language models are trained in two stages: (1) unsupervised pretra...

0 Chunting Zhou, et al. ∙

research

∙ 05/12/2023

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

Autoregressive transformers are spectacular models for short sequences b...

0 Lili Yu, et al. ∙

research

∙ 05/06/2023

Residual Prompt Tuning: Improving Prompt Tuning with Residual Reparameterization

Prompt tuning is one of the successful approaches for parameter-efficien...

3 Anastasia Razdaibiedina, et al. ∙

research

∙ 03/24/2023

Scaling Expert Language Models with Unsupervised Domain Discovery

Large language models are typically trained densely: all parameters are ...

0 Suchin Gururangan, et al. ∙

research

∙ 01/30/2023

REPLUG: Retrieval-Augmented Black-Box Language Models

We introduce REPLUG, a retrieval-augmented language modeling framework t...

7 Weijia Shi, et al. ∙

research

∙ 01/29/2023

Progressive Prompts: Continual Learning for Language Models

We introduce Progressive Prompts - a simple and efficient approach for c...

0 Anastasia Razdaibiedina, et al. ∙

research

∙ 12/15/2022

Improving Chess Commentaries by Combining Language Models with Symbolic Reasoning Engines

Despite many recent advancements in language modeling, state-of-the-art ...

0 Andrew Lee, et al. ∙

research

∙ 12/05/2022

In-context Examples Selection for Machine Translation

Large-scale generative models show an impressive ability to perform a wi...

0 Sweta Agrawal, et al. ∙

research

∙ 12/02/2022

Nonparametric Masked Language Modeling

Existing language models (LMs) predict tokens with a softmax over a fini...

0 Sewon Min, et al. ∙

research

∙ 11/29/2022

Coder Reviewer Reranking for Code Generation

Sampling diverse programs from a code language model and reranking with ...

0 Tianyi Zhang, et al. ∙

research

∙ 11/22/2022

AutoReply: Detecting Nonsense in Dialogue Introspectively with Discriminative Replies

Existing approaches built separate classifiers to detect nonsense in dia...

0 Weiyan Shi, et al. ∙

research

∙ 11/22/2022

Retrieval-Augmented Multimodal Language Modeling

Recent multimodal models such as DALL-E and CM3 have achieved remarkable...

28 Michihiro Yasunaga, et al. ∙

research

∙ 10/27/2022

Contrastive Decoding: Open-ended Text Generation as Optimization

Likelihood, although useful as a training loss, is a poor search objecti...

0 Xiang Lisa Li, et al. ∙

research

∙ 10/07/2022

Measuring and Narrowing the Compositionality Gap in Language Models

We investigate the ability of language models to perform compositional r...

0 Ofir Press, et al. ∙

research

∙ 08/15/2022

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

Large language models have been widely adopted but require significant G...

0 Tim Dettmers, et al. ∙

research

∙ 08/05/2022

Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

We present Branch-Train-Merge (BTM), a communication-efficient algorithm...

0 Margaret Li, et al. ∙

research

∙ 06/21/2022

Questions Are All You Need to Train a Dense Passage Retriever

We introduce ART, a new corpus-level autoencoding approach for training ...

6 Devendra Singh Sachan, et al. ∙

research

∙ 06/07/2022

LegoNN: Building Modular Encoder-Decoder Models

State-of-the-art encoder-decoder models (e.g. for machine translation (M...

0 Siddharth Dalmia, et al. ∙

research

∙ 05/09/2022

Few-shot Mining of Naturally Occurring Inputs and Outputs

Creating labeled natural language training data is expensive and require...

5 Mandar Joshi, et al. ∙

research

∙ 04/15/2022

Improving Passage Retrieval with Zero-Shot Question Generation

We propose a simple and effective re-ranking method for improving passag...

0 Devendra Singh Sachan, et al. ∙

research

∙ 04/12/2022

InCoder: A Generative Model for Code Infilling and Synthesis

Code is seldom written in a single left-to-right pass and is instead rep...

6 Daniel Fried, et al. ∙

research

∙ 02/25/2022

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

Large language models (LMs) are able to in-context learn – perform a new...

2 Sewon Min, et al. ∙

research

∙ 01/19/2022

CM3: A Causal Masked Multimodal Model of the Internet

We introduce CM3, a family of causally masked generative models trained ...

8 Armen Aghajanyan, et al. ∙

research

∙ 10/29/2021

MetaICL: Learning to Learn In Context

We introduce MetaICL (Meta-training for In-Context Learning), a new meta...

0 Sewon Min, et al. ∙

research

∙ 10/16/2021

Sparse Distillation: Speeding Up Text Classification by Using Bigger Models

Distilling state-of-the-art transformer models into lightweight student ...

0 Qinyuan Ye, et al. ∙

research

∙ 10/15/2021

Tricks for Training Sparse Translation Models

Multi-task learning with an unbalanced data distribution skews model lea...

8 Dheeru Dua, et al. ∙

research

∙ 10/06/2021

8-bit Optimizers via Block-wise Quantization

Stateful optimizers maintain gradient statistics over time, e.g., the ex...

0 Tim Dettmers, et al. ∙

research

∙ 08/27/2021

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

Since the introduction of the transformer model by Vaswani et al. (2017)...

0 Ofir Press, et al. ∙

research

∙ 08/11/2021

DEMix Layers: Disentangling Domains for Modular Language Modeling

We introduce a new domain expert mixture (DEMix) layer that enables cond...

0 Suchin Gururangan, et al. ∙

research

∙ 08/09/2021

Noisy Channel Language Model Prompting for Few-Shot Text Classification

We introduce a noisy channel approach for language model prompting in fe...

0 Sewon Min, et al. ∙

research

∙ 07/14/2021

HTLM: Hyper-Text Pre-Training and Prompting of Language Models

We introduce HTLM, a hyper-text language model trained on a large-scale ...

0 Armen Aghajanyan, et al. ∙

research

∙ 06/15/2021

Question Answering Infused Pre-training of General-Purpose Contextualized Representations

This paper proposes a pre-training objective based on question answering...

0 Robin Jia, et al. ∙

research

∙ 04/15/2021

Multitasking Inhibits Semantic Drift

When intelligent agents communicate to accomplish shared goals, how do t...

0 Athul Paul Jacob, et al. ∙

research

∙ 03/30/2021

BASE Layers: Simplifying Training of Large, Sparse Models

We introduce a new balanced assignment of experts (BASE) layer for large...

0 Mike Lewis, et al. ∙

research

∙ 12/31/2020

Shortformer: Better Language Modeling using Shorter Inputs

We explore the benefits of decreasing the input length of transformers. ...

0 Ofir Press, et al. ∙

research

∙ 12/30/2020

Joint Verification and Reranking for Open Fact Checking Over Tables

Structured information is an important knowledge source for automatic ve...

0 Michael Schlichtkrull, et al. ∙

research

∙ 10/01/2020

Nearest Neighbor Machine Translation

We introduce k-nearest-neighbor machine translation (kNN-MT), which pred...

0 Urvashi Khandelwal, et al. ∙

research

∙ 09/28/2020

Conversational Semantic Parsing

The structured representation for semantic parsing in task-oriented assi...

0 Armen Aghajanyan, et al. ∙

research

∙ 09/16/2020

Grounded Adaptation for Zero-shot Executable Semantic Parsing

We propose Grounded Adaptation for Zero-shot Executable Semantic Parsing...

5 Victor Zhong, et al. ∙

research

∙ 06/26/2020

Pre-training via Paraphrasing

We introduce MARGE, a pre-trained sequence-to-sequence model learned wit...

7 Mike Lewis, et al. ∙

research

∙ 05/22/2020

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Large pre-trained language models have been shown to store factual knowl...

0 Patrick Lewis, et al. ∙

research

∙ 04/08/2020

Asking and Answering Questions to Evaluate the Factual Consistency of Summaries

Practical applications of abstractive summarization models are limited b...

0 Alex Wang, et al. ∙

research

∙ 01/22/2020

Multilingual Denoising Pre-training for Neural Machine Translation

This paper demonstrates that multilingual denoising pre-training produce...

0 Yinhan Liu, et al. ∙

research

∙ 11/09/2019

Enforcing Encoder-Decoder Modularity in Sequence-to-Sequence Models

Inspired by modular software design principles of independence, intercha...

0 Siddharth Dalmia, et al. ∙

research

∙ 11/01/2019

Generalization through Memorization: Nearest Neighbor Language Models

We introduce kNN-LMs, which extend a pre-trained neural language model (...

0 Urvashi Khandelwal, et al. ∙

research

∙ 10/29/2019

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

We present BART, a denoising autoencoder for pretraining sequence-to-seq...

35 Mike Lewis, et al. ∙

Mike Lewis

Featured Co-authors

Sign in with Google

Consider DeepAI Pro