Regularizing Transformers With Deep Probabilistic Layers

08/23/2021
by   Aurora Cobo Aguilera, et al.
0

Language models (LM) have grown with non-stop in the last decade, from sequence-to-sequence architectures to the state-of-the-art and utter attention-based Transformers. In this work, we demonstrate how the inclusion of deep generative models within BERT can bring more versatile models, able to impute missing/noisy words with richer text or even improve BLEU score. More precisely, we use a Gaussian Mixture Variational Autoencoder (GMVAE) as a regularizer layer and prove its effectiveness not only in Transformers but also in the most relevant encoder-decoder based LM, seq2seq with and without attention.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2022

A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck

We propose a VAE for Transformers by developing a variational informatio...
research
02/20/2020

Balancing Cost and Benefit with Tied-Multi Transformers

We propose and evaluate a novel procedure for training multiple Transfor...
research
09/22/2022

Improving Attention-Based Interpretability of Text Classification Transformers

Transformers are widely used in NLP, where they consistently achieve sta...
research
02/01/2022

Natural Language to Code Using Transformers

We tackle the problem of generating code snippets from natural language ...
research
12/28/2022

Exploring Vision Transformers as Diffusion Learners

Score-based diffusion models have captured widespread attention and fund...
research
06/06/2020

Challenges and Thrills of Legal Arguments

State-of-the-art attention based models, mostly centered around the tran...
research
08/17/2022

UniLayout: Taming Unified Sequence-to-Sequence Transformers for Graphic Layout Generation

To satisfy various user needs, different subtasks of graphic layout gene...

Please sign up or login with your details

Forgot password? Click here to reset