Mixture Models for Diverse Machine Translation: Tricks of the Trade

02/20/2019
by   Tianxiao Shen, et al.
0

Mixture models trained via EM are among the simplest, most widely used and well understood latent variable models in the machine learning literature. Surprisingly, these models have been hardly explored in text generation applications such as machine translation. In principle, they provide a latent variable to control generation and produce a diverse set of hypotheses. In practice, however, mixture models are prone to degeneracies---often only one component gets trained or the latent variable is simply ignored. We find that disabling dropout noise in responsibility computation is critical to successful training. In addition, the design choices of parameterization, prior distribution, hard versus soft EM and online versus offline assignment can dramatically affect model performance. We develop an evaluation protocol to assess both quality and diversity of generations against multiple references, and provide an extensive empirical study of several mixture model variants. Our analysis shows that certain types of mixture models are more robust and offer the best trade-off between translation quality and diversity compared to variational models and diverse decoding approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/11/2018

Conditional Variational Autoencoder for Neural Machine Translation

We explore the performance of latent variable models for conditional tex...
research
03/03/2022

Deep Latent-Variable Models for Text Generation

Text generation aims to produce human-like natural language output for d...
research
10/17/2018

Sequence to Sequence Mixture Model for Diverse Machine Translation

Sequence to sequence (SEQ2SEQ) models often lack diversity in their gene...
research
07/31/2019

Neural Network based Explicit Mixture Models and Expectation-maximization based Learning

We propose two neural network based mixture models in this article. The ...
research
10/04/2021

A moment-matching metric for latent variable generative models

It can be difficult to assess the quality of a fitted model when facing ...
research
08/17/2021

SPMoE: Generate Multiple Pattern-Aware Outputs with Sparse Pattern Mixture of Experts

Many generation tasks follow a one-to-many mapping relationship: each in...
research
02/17/2020

On the Discrepancy between Density Estimation and Sequence Generation

Many sequence-to-sequence generation tasks, including machine translatio...

Please sign up or login with your details

Forgot password? Click here to reset