Human and Automatic Detection of Generated Text

11/02/2019
by   Daphne Ippolito, et al.
0

With the advent of generative models with a billion parameters or more, it is now possible to automatically generate vast amounts of human-sounding text. This raises questions into just how human-like is the machine-generated text, and how long does a text excerpt need to be for both humans and automatic discriminators to be able reliably detect that it was machine-generated. In this paper, we conduct a thorough investigation of how choices such as sampling strategy and text excerpt length can impact the performance of automatic detection methods as well as human raters. We find that the sampling strategies which result in more human-like text according to human raters create distributional differences from human-written text that make detection easy for automatic discriminators.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/24/2022

Real or Fake Text?: Investigating Human Ability to Detect Boundaries Between Human-Written and Machine-Generated Text

As text generated by large language models proliferates, it becomes vita...
research
06/02/2021

Detecting Bot-Generated Text by Characterizing Linguistic Accommodation in Human-Bot Interactions

Language generation models' democratization benefits many domains, from ...
research
11/04/2021

Unsupervised and Distributional Detection of Machine-Generated Text

The power of natural language generation models has provoked a flurry of...
research
10/26/2022

Active Countermeasures for Email Fraud

As a major component of online crime, email-based fraud is a threat that...
research
04/11/2023

Towards an Understanding and Explanation for Mixed-Initiative Artificial Scientific Text Detection

Large language models (LLMs) have gained popularity in various fields fo...
research
04/12/2022

Generating Full Length Wikipedia Biographies: The Impact of Gender Bias on the Retrieval-Based Generation of Women Biographies

Generating factual, long-form text such as Wikipedia articles raises thr...
research
10/13/2022

Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods

Advances in natural language generation (NLG) have resulted in machine g...

Please sign up or login with your details

Forgot password? Click here to reset