GPT-3.5 vs GPT-4: Evaluating ChatGPT's Reasoning Performance in Zero-shot Learning

05/21/2023
by   Jessica López Espejel, et al.
0

Large Language Models (LLMs) have exhibited remarkable performance on various Natural Language Processing (NLP) tasks. However, there is a current hot debate regarding their reasoning capacity. In this paper, we examine the performance of GPT-3.5 and GPT-4 models, by performing a thorough technical evaluation on different reasoning tasks across eleven distinct datasets. Our findings show that GPT-4 outperforms GPT-3.5 in zero-shot learning throughout almost all evaluated tasks. In addition, we note that both models exhibit limited performance in Inductive, Mathematical, and Multi-hop Reasoning Tasks. While it may seem intuitive that the GPT-4 model would outperform GPT-3.5 given its size and efficiency in various NLP tasks, our paper offers empirical evidence to support this claim. We provide a detailed and comprehensive analysis of the results from both models to further support our findings. In addition, we propose a set of engineered prompts that improves performance of both models on zero-shot learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2023

Is ChatGPT a General-Purpose Natural Language Processing Task Solver?

Spurred by advancements in scale, large language models (LLMs) have demo...
research
09/13/2023

How (Not) to Use Sociodemographic Information for Subjective NLP Tasks

Annotators' sociodemographic backgrounds (i.e., the individual compositi...
research
05/28/2023

Tab-CoT: Zero-shot Tabular Chain of Thought

The chain-of-though (CoT) prompting methods were successful in various n...
research
05/23/2023

Prompt position really matters in few-shot and zero-shot NLU tasks

Prompt-based models have made remarkable advancements in the fields of z...
research
08/01/2022

Few-shot Adaptation Works with UnpredicTable Data

Prior work on language models (LMs) shows that training on a large numbe...
research
09/20/2023

Making Small Language Models Better Multi-task Learners with Mixture-of-Task-Adapters

Recently, Large Language Models (LLMs) have achieved amazing zero-shot l...
research
10/23/2022

TAPE: Assessing Few-shot Russian Language Understanding

Recent advances in zero-shot and few-shot learning have shown promise fo...

Please sign up or login with your details

Forgot password? Click here to reset