BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information

06/13/2023
by   Mehran Kazemi, et al.
0

Automated reasoning with unstructured natural text is a key requirement for many potential applications of NLP and for developing robust AI systems. Recently, Language Models (LMs) have demonstrated complex reasoning capacities even without any finetuning. However, existing evaluation for automated reasoning assumes access to a consistent and coherent set of information over which models reason. When reasoning in the real-world, the available information is frequently inconsistent or contradictory, and therefore models need to be equipped with a strategy to resolve such conflicts when they arise. One widely-applicable way of resolving conflicts is to impose preferences over information sources (e.g., based on source credibility or information recency) and adopt the source with higher preference. In this paper, we formulate the problem of reasoning with contradictory information guided by preferences over sources as the classical problem of defeasible reasoning, and develop a dataset called BoardgameQA for measuring the reasoning capacity of LMs in this setting. BoardgameQA also incorporates reasoning with implicit background knowledge, to better reflect reasoning problems in downstream applications. We benchmark various LMs on BoardgameQA and the results reveal a significant gap in the reasoning capacity of state-of-the-art LMs on this problem, showing that reasoning with conflicting information does not surface out-of-the-box in LMs. While performance can be improved with finetuning, it nevertheless remains poor.

READ FULL TEXT

page 8

page 9

page 15

page 16

page 17

page 18

page 19

page 20

research
03/26/2023

Nature Language Reasoning, A Survey

This survey paper proposes a clearer view of natural language reasoning ...
research
12/20/2022

LAMBADA: Backward Chaining for Automated Reasoning in Natural Language

Remarkable progress has been made on automated reasoning with knowledge ...
research
12/15/2022

The KITMUS Test: Evaluating Knowledge Integration from Multiple Sources in Natural Language Understanding Systems

Many state-of-the-art natural language understanding (NLU) models are ba...
research
09/20/2023

LLM Guided Inductive Inference for Solving Compositional Problems

While large language models (LLMs) have demonstrated impressive performa...
research
02/07/2023

Reliable Natural Language Understanding with Large Language Models and Answer Set Programming

Humans understand language by extracting information (meaning) from sent...
research
06/14/2022

Understanding Narratives through Dimensions of Analogy

Analogical reasoning is a powerful qualitative reasoning tool that enabl...
research
02/07/2018

Reasoning in a Hierarchical System with Missing Group Size Information

The paper analyzes the problem of judgments or preferences subsequent to...

Please sign up or login with your details

Forgot password? Click here to reset