Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

by   Niels Mündler, et al.

Large language models (large LMs) are susceptible to producing text with hallucinated content. Self-contradiction, where the LM generates two contradictory sentences within the same context, is an important form of hallucination. In this work, we present a comprehensive analysis on self-contradiction for state-of-the-art, instruction-tuned LMs, including evaluation, detection, and mitigation. To effectively trigger self-contradictions, we design a framework that constrains LMs to generate appropriate sentence pairs. Our evaluation on these sentence pairs reveals that self-contradictions occur frequently across different LMs for both famous and lesser-known topics. Next, we prompt the LMs to detect self-contradictions. Our results indicate that ChatGPT and GPT-4 are able to accurately identify self-contradictions, while Vicuna-13B struggles to do so. For example, with our best prompting method, ChatGPT achieves 91.0 sentence pairs generated by itself. To automatically mitigate self-contradictions, we develop an iterative algorithm that prompts the LMs to remove the detected self-contradictions from the generated text. Our algorithm successfully revises the text such that self-contradictions are significantly reduced, while maintaining its fluency and informativeness. Importantly, our entire pipeline of triggering, detecting, and mitigating self-contradictions is applicable to black-box LMs and does not require any external grounded knowledge.


page 1

page 2

page 3

page 4


A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation

Recently developed large language models have achieved remarkable succes...

Self-Prompting Large Language Models for Open-Domain QA

Open-Domain Question Answering (ODQA) requires models to answer factoid ...

Testing the limits of natural language models for predicting human language judgments

Neural network language models can serve as computational hypotheses abo...

Zero-Resource Hallucination Prevention for Large Language Models

The prevalent use of large language models (LLMs) in various domains has...

Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP

When trained on large, unfiltered crawls from the internet, language mod...

Ghostbuster: Detecting Text Ghostwritten by Large Language Models

We introduce Ghostbuster, a state-of-the-art system for detecting AI-gen...

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

Generative Large Language Models (LLMs) such as GPT-3 are capable of gen...

Please sign up or login with your details

Forgot password? Click here to reset