b'Zihang Dai'

research

∙ 02/21/2022

Transformer Quality in Linear Time

We revisit the design choices in Transformers, and propose methods to ad...

0 Weizhe Hua, et al. ∙

research

∙ 11/19/2021

Combined Scaling for Zero-shot Transfer Learning

We present a combined scaling method called BASIC that achieves 85.7 zer...

0 Hieu Pham, et al. ∙

research

∙ 09/17/2021

Primer: Searching for Efficient Transformers for Language Modeling

Large Transformer models have been central to recent advances in natural...

0 David R. So, et al. ∙

research

∙ 08/24/2021

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

With recent progress in joint modeling of visual and textual representat...

6 Zirui Wang, et al. ∙

research

∙ 07/12/2021

Combiner: Full Attention Transformer with Sparse Computation Cost

Transformers provide a class of expressive architectures that are extrem...

3 Hongyu Ren, et al. ∙

research

∙ 06/09/2021

CoAtNet: Marrying Convolution and Attention for All Data Sizes

Transformers have attracted increasing interests in computer vision, but...

0 Zihang Dai, et al. ∙

research

∙ 05/17/2021

Pay Attention to MLPs

Transformers have become one of the most important architectural innovat...

39 Hanxiao Liu, et al. ∙

research

∙ 09/18/2020

Unsupervised Parallel Corpus Mining on Web Data

With a large amount of parallel data, neural machine translation systems...

0 Guokun Lai, et al. ∙

research

∙ 06/05/2020

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

With the success of language pretraining, it is highly desirable to deve...

0 Zihang Dai, et al. ∙

research

∙ 03/23/2020

Meta Pseudo Labels

Many training algorithms of a deep neural network can be interpreted as ...

0 Hieu Pham, et al. ∙

research

∙ 10/18/2019

A Mutual Information Maximization Perspective of Language Representation Learning

We show state-of-the-art word representation learning methods maximize a...

0 Lingpeng Kong, et al. ∙

research

∙ 06/19/2019

XLNet: Generalized Autoregressive Pretraining for Language Understanding

With the capability of modeling bidirectional contexts, denoising autoen...

0 Zhilin Yang, et al. ∙

research

∙ 04/29/2019

Unsupervised Data Augmentation

Despite its success, deep learning still needs large labeled datasets to...

18 Qizhe Xie, et al. ∙

research

∙ 02/04/2019

Re-examination of the Role of Latent Variables in Sequence Modeling

With latent variables, stochastic recurrent models have achieved state-o...

0 Zihang Dai, et al. ∙

research

∙ 01/09/2019

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformer networks have a potential of learning longer-term dependency...

2 Zihang Dai, et al. ∙

research

∙ 11/24/2018

Characterizing and Avoiding Negative Transfer

When labeled data is scarce for a specific target task, transfer learnin...

0 Zirui Wang, et al. ∙

research

∙ 09/25/2018

Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation

Mixture of Softmaxes (MoS) has been shown to be effective at addressing ...

0 Xiang Kong, et al. ∙

research

∙ 08/22/2018

SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation

In this work, we examine methods for data augmentation for text-based ta...

0 Xinyi Wang, et al. ∙

research

∙ 04/29/2018

From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction

In this work, we study the credit assignment problem in reward augmented...

0 Zihang Dai, et al. ∙

research

∙ 11/10/2017

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

We formulate language modeling as a matrix factorization problem, and sh...

0 Zhilin Yang, et al. ∙

research

∙ 11/09/2017

Large-scale Cloze Test Dataset Designed by Teachers

Cloze test is widely adopted in language exams to evaluate students' lan...

0 Qizhe Xie, et al. ∙

research

∙ 05/31/2017

Controllable Invariance through Adversarial Feature Learning

Learning meaningful representations that maintain the content necessary ...

0 Qizhe Xie, et al. ∙

research

∙ 05/27/2017

Good Semi-supervised Learning that Requires a Bad GAN

Semi-supervised learning methods based on generative adversarial network...

0 Zihang Dai, et al. ∙

research

∙ 04/19/2017

An Interpretable Knowledge Transfer Model for Knowledge Base Completion

Knowledge bases are important resources for a variety of natural languag...

0 Qizhe Xie, et al. ∙

research

∙ 06/07/2016

CFO: Conditional Focused Neural Question Answering with Large-scale Knowledge Bases

How can we enable computers to automatically answer questions like "Who ...

0 Zihang Dai, et al. ∙

Zihang Dai

Featured Co-authors

Sign in with Google

Consider DeepAI Pro