A Natural Bias for Language Generation Models

12/19/2022
by   Clara Meister, et al.
0

After just a few hundred training updates, a standard probabilistic model for language generation has likely not yet learnt many semantic or syntactic rules of natural language, which inherently makes it difficult to estimate the right probability distribution over next tokens. Yet around this point, these models have identified a simple, loss-minimising behaviour: to output the unigram distribution of the target training corpus. The use of such a crude heuristic raises the question: Rather than wasting precious compute resources and model capacity for learning this strategy at early training stages, can we initialise our models with this behaviour? Here, we show that we can effectively endow our model with a separate module that reflects unigram frequency statistics as prior knowledge. Standard neural language generation architectures offer a natural opportunity for implementing this idea: by initialising the bias term in a model's final linear layer with the log-unigram distribution. Experiments in neural machine translation demonstrate that this simple technique: (i) improves learning efficiency; (ii) achieves better overall performance; and (iii) appears to disentangle strong frequency effects, encouraging the model to specialise in non-frequency-related aspects of language.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/22/2021

Enriching Non-Autoregressive Transformer with Syntactic and SemanticStructures for Neural Machine Translation

The non-autoregressive models have boosted the efficiency of neural mach...
research
02/25/2017

Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations

We introduce AI rationalization, an approach for generating explanations...
research
04/22/2019

Compositional generalization in a deep seq2seq model by separating syntax and semantics

Standard methods in deep learning for natural language processing fail t...
research
04/16/2021

IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation

A benchmark provides an ecosystem to measure the advancement of models w...
research
07/28/2019

Representation Degeneration Problem in Training Natural Language Generation Models

We study an interesting problem in training neural network-based models ...
research
02/16/2023

Role of Bias Terms in Dot-Product Attention

Dot-product attention is a core module in the present generation of neur...

Please sign up or login with your details

Forgot password? Click here to reset