Deepfake Text Detection in the Wild

by   Yafu Li, et al.
The University of Hong Kong

Recent advances in large language models have enabled them to reach a level of text generation comparable to that of humans. These models show powerful capabilities across a wide range of content, including news article writing, story generation, and scientific writing. Such capability further narrows the gap between human-authored and machine-generated texts, highlighting the importance of deepfake text detection to avoid potential risks such as fake news propagation and plagiarism. However, previous work has been limited in that they testify methods on testbed of specific domains or certain language models. In practical scenarios, the detector faces texts from various domains or LLMs without knowing their sources. To this end, we build a wild testbed by gathering texts from various human writings and deepfake texts generated by different LLMs. Human annotators are only slightly better than random guessing at identifying machine-generated texts. Empirical results on automatic detection methods further showcase the challenges of deepfake text detection in a wild testbed. In addition, out-of-distribution poses a greater challenge for a detector to be employed in realistic application scenarios. We release our resources at


Multiscale Positive-Unlabeled Detection of AI-Generated Texts

Recent releases of Large Language Models (LLMs), e.g. ChatGPT, are aston...

RADAR: Robust AI-Text Detection via Adversarial Learning

Recent advances in large language models (LLMs) and the intensifying pop...

How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection

The introduction of ChatGPT has garnered widespread attention in both ac...

Automatic Extraction of Personality from Text: Challenges and Opportunities

In this study, we examined the possibility to extract personality traits...

Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

The remarkable capabilities of large-scale language models, such as Chat...

A Comprehensive Survey of Natural Language Generation Advances from the Perspective of Digital Deception

In recent years there has been substantial growth in the capabilities of...

Please sign up or login with your details

Forgot password? Click here to reset