Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data

10/12/2020
by   Katja Filippova, et al.
0

Neural text generation (data- or text-to-text) demonstrates remarkable performance when training data is abundant which for many applications is not the case. To collect a large corpus of parallel data, heuristic rules are often used but they inevitably let noise into the data, such as phrases in the output which cannot be explained by the input. Consequently, models pick up on the noise and may hallucinate–generate fluent but unsupported text. Our contribution is a simple but powerful technique to treat such hallucinations as a controllable aspect of the generated text, without dismissing any input and without modifying the model architecture. On the WikiBio corpus (Lebret et al., 2016), a particularly noisy dataset, we demonstrate the efficacy of the technique both in an automatic and in a human evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2020

Multimodal Story Generation on Plural Images

Traditionally, text generation models take in a sequence of text as inpu...
research
06/17/2020

Automatically Ranked Russian Paraphrase Corpus for Text Generation

The article is focused on automatic development and ranking of a large c...
research
05/19/2023

STOAT: Structured Data to Analytical Text With Controls

Recent language models have made tremendous progress in the structured d...
research
09/10/2018

Unsupervised Controllable Text Formalization

We propose a novel framework for controllable natural language transform...
research
04/27/2023

SweCTRL-Mini: a data-transparent Transformer-based large language model for controllable text generation in Swedish

We present SweCTRL-Mini, a large Swedish language model that can be used...
research
07/02/2023

Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal Data

This paper addresses the ethical concerns arising from the use of unauth...
research
05/25/2021

Empirical Error Modeling Improves Robustness of Noisy Neural Sequence Labeling

Despite recent advances, standard sequence labeling systems often fail w...

Please sign up or login with your details

Forgot password? Click here to reset