RADAR: Robust AI-Text Detection via Adversarial Learning

by   Xiaomeng Hu, et al.
The Chinese University of Hong Kong

Recent advances in large language models (LLMs) and the intensifying popularity of ChatGPT-like applications have blurred the boundary of high-quality text generation between humans and machines. However, in addition to the anticipated revolutionary changes to our technology and society, the difficulty of distinguishing LLM-generated texts (AI-text) from human-generated texts poses new challenges of misuse and fairness, such as fake content generation, plagiarism, and false accusation of innocent writers. While existing works show that current AI-text detectors are not robust to LLM-based paraphrasing, this paper aims to bridge this gap by proposing a new framework called RADAR, which jointly trains a Robust AI-text Detector via Adversarial leaRning. RADAR is based on adversarial training of a paraphraser and a detector. The paraphraser's goal is to generate realistic contents to evade AI-text detection. RADAR uses the feedback from the detector to update the paraphraser, and vice versa. Evaluated with 8 different LLMs (Pythia, Dolly 2.0, Palmyra, Camel, GPT-J, Dolly 1.0, LLaMA, and Vicuna) across 4 datasets, experimental results show that RADAR significantly outperforms existing AI-text detection methods, especially when paraphrasing is in place. We also identify the strong transferability of RADAR from instruction-tuned LLMs to other LLMs, and evaluate the improved capability of RADAR via GPT-3.5.


page 3

page 8


Deepfake Text Detection in the Wild

Recent advances in large language models have enabled them to reach a le...

Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts

Rapidly increasing quality of AI-generated content makes it difficult to...

OUTFOX: LLM-generated Essay Detection through In-context Learning with Adversarially Generated Examples

Large Language Models (LLMs) have achieved human-level fluency in text g...

Provable Robust Watermarking for AI-Generated Text

As AI-generated text increasingly resembles human-written content, the a...

G3Detector: General GPT-Generated Text Detector

The burgeoning progress in the field of Large Language Models (LLMs) her...

Multiscale Positive-Unlabeled Detection of AI-Generated Texts

Recent releases of Large Language Models (LLMs), e.g. ChatGPT, are aston...

MCROOD: Multi-Class Radar Out-Of-Distribution Detection

Out-of-distribution (OOD) detection has recently received special attent...

Please sign up or login with your details

Forgot password? Click here to reset