Large Language Models Are Not Abstract Reasoners

by   Gaël Gendron, et al.

Large Language Models have shown tremendous performance on a large variety of natural language processing tasks, ranging from text comprehension to common sense reasoning. However, the mechanisms responsible for this success remain unknown, and it is unclear whether LLMs can achieve human-like cognitive capabilities or whether these models are still fundamentally limited. Abstract reasoning is a fundamental task for cognition, consisting of finding and applying a general pattern from few data. Evaluating deep neural architectures on this task could give insight into their potential limitations regarding reasoning and their broad generalisation abilities, yet this is currently an under-explored area. In this paper, we perform extensive evaluations of state-of-the-art LLMs on abstract reasoning tasks, showing that they achieve very limited performance in contrast with other natural language tasks, and we investigate the reasons for this difference. We apply techniques that have been shown to improve performance on other NLP tasks and show that in most cases their impact on abstract reasoning performance is limited. In the course of this work, we have generated a new benchmark for evaluating language models on abstract reasoning tasks.


page 14

page 18

page 22

page 23

page 27

page 29

page 30

page 33


Language models show human-like content effects on reasoning

Abstract reasoning is a key ability for an intelligent system. Large lan...

Leveraging the Inductive Bias of Large Language Models for Abstract Textual Reasoning

Large natural language models (such as GPT-3 or T5) demonstrate impressi...

No Strong Feelings One Way or Another: Re-operationalizing Neutrality in Natural Language Inference

Natural Language Inference (NLI) has been a cornerstone task in evaluati...

Evaluating Shutdown Avoidance of Language Models in Textual Scenarios

Recently, there has been an increase in interest in evaluating large lan...

covLLM: Large Language Models for COVID-19 Biomedical Literature

The COVID-19 pandemic led to 1.1 million deaths in the United States, de...

Is the Computation of Abstract Sameness Relations Human-Like in Neural Language Models?

In recent years, deep neural language models have made strong progress i...

Limitations of Language Models in Arithmetic and Symbolic Induction

Recent work has shown that large pretrained Language Models (LMs) can no...

Please sign up or login with your details

Forgot password? Click here to reset