Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4

04/07/2023
by   Hanmeng Liu, et al.
0

Harnessing logical reasoning ability is a comprehensive natural language understanding endeavor. With the release of Generative Pretrained Transformer 4 (GPT-4), highlighted as "advanced" at reasoning tasks, we are eager to learn the GPT-4 performance on various logical reasoning tasks. This report analyses multiple logical reasoning datasets, with popular benchmarks like LogiQA and ReClor, and newly-released datasets like AR-LSAT. We test the multi-choice reading comprehension and natural language inference tasks with benchmarks requiring logical reasoning. We further construct a logical reasoning out-of-distribution dataset to investigate the robustness of ChatGPT and GPT-4. We also make a performance comparison between ChatGPT and GPT-4. Experiment results show that ChatGPT performs significantly better than the RoBERTa fine-tuning method on most logical reasoning benchmarks. GPT-4 shows even higher performance on our manual tests. Among benchmarks, ChatGPT and GPT-4 do relatively well on well-known datasets like LogiQA and ReClor. However, the performance drops significantly when handling newly released and out-of-distribution datasets. Logical reasoning remains challenging for ChatGPT and GPT-4, especially on out-of-distribution and natural language inference datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 6

research
03/20/2022

A Neural-Symbolic Approach to Natural Language Understanding

Deep neural networks, empowered by pre-trained language models, have ach...
research
05/25/2022

RobustLR: Evaluating Robustness to Logical Perturbation in Deductive Reasoning

Transformers have been shown to be able to perform deductive reasoning o...
research
02/11/2020

ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning

Recent powerful pre-trained language models have achieved remarkable per...
research
08/15/2021

Exploring Generalization Ability of Pretrained Language Models on Arithmetic and Logical Reasoning

To quantitatively and intuitively explore the generalization ability of ...
research
12/04/2021

LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning Capabilities for NLI

Natural Language Inference (NLI) is considered a representative task to ...
research
05/22/2023

Logical Reasoning for Natural Language Inference Using Generated Facts as Atoms

State-of-the-art neural models can now reach human performance levels ac...
research
05/23/2022

On the Paradox of Learning to Reason from Data

Logical reasoning is needed in a wide range of NLP tasks. Can a BERT mod...

Please sign up or login with your details

Forgot password? Click here to reset