Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

by   Lingyi Yang, et al.

The remarkable capabilities of large-scale language models, such as ChatGPT, in text generation have incited awe and spurred researchers to devise detectors to mitigate potential risks, including misinformation, phishing, and academic dishonesty. Despite this, most previous studies, including HC3, have been predominantly geared towards creating detectors that differentiate between purely ChatGPT-generated texts and human-authored texts. This approach, however, fails to work on discerning texts generated through human-machine collaboration, such as ChatGPT-polished texts. Addressing this gap, we introduce a novel dataset termed HPPT (ChatGPT-polished academic abstracts), facilitating the construction of more robust detectors. It diverges from extant corpora by comprising pairs of human-written and ChatGPT-polished abstracts instead of purely ChatGPT-generated texts. Additionally, we propose the "Polish Ratio" method, an innovative measure of ChatGPT's involvement in text generation based on editing distance. It provides a mechanism to measure the degree of human originality in the resulting text. Our experimental results show our proposed model has better robustness on the HPPT dataset and two existing datasets (HC3 and CDB). Furthermore, the "Polish Ratio" we proposed offers a more comprehensive explanation by quantifying the degree of ChatGPT involvement, which indicates that a Polish Ratio value greater than 0.2 signifies ChatGPT involvement and a value exceeding 0.6 implies that ChatGPT generates most of the text.


A Benchmark Corpus for the Detection of Automatically Generated Text in Academic Publications

Automatic text generation based on neural language models has achieved p...

OUTFOX: LLM-generated Essay Detection through In-context Learning with Adversarially Generated Examples

Large Language Models (LLMs) have achieved human-level fluency in text g...

ChatGPT-4 as a Tool for Reviewing Academic Books in Spanish

This study evaluates the potential of ChatGPT-4, an artificial intellige...

Deepfake Text Detection in the Wild

Recent advances in large language models have enabled them to reach a le...

Check Me If You Can: Detecting ChatGPT-Generated Academic Writing using CheckGPT

With ChatGPT under the spotlight, utilizing large language models (LLMs)...

Towards Codable Text Watermarking for Large Language Models

As large language models (LLMs) generate texts with increasing fluency a...

Methods for Detecting Paraphrase Plagiarism

Paraphrase plagiarism is one of the difficult challenges facing plagiari...

Please sign up or login with your details

Forgot password? Click here to reset