A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models

03/18/2023
by   Junjie Ye, et al.
0

GPT series models, such as GPT-3, CodeX, InstructGPT, ChatGPT, and so on, have gained considerable attention due to their exceptional natural language processing capabilities. However, despite the abundance of research on the difference in capabilities between GPT series models and fine-tuned models, there has been limited attention given to the evolution of GPT series models' capabilities over time. To conduct a comprehensive analysis of the capabilities of GPT series models, we select six representative models, comprising two GPT-3 series models (i.e., davinci and text-davinci-001) and four GPT-3.5 series models (i.e., code-davinci-002, text-davinci-002, text-davinci-003, and gpt-3.5-turbo). We evaluate their performance on nine natural language understanding (NLU) tasks using 21 datasets. In particular, we compare the performance and robustness of different models for each task under zero-shot and few-shot scenarios. Our extensive experiments reveal that the overall ability of GPT series models on NLU tasks does not increase gradually as the models evolve, especially with the introduction of the RLHF training strategy. While this strategy enhances the models' ability to generate human-like responses, it also compromises their ability to solve some tasks. Furthermore, our findings indicate that there is still room for improvement in areas such as model robustness.

READ FULL TEXT

page 3

page 7

research
03/01/2023

How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks

The GPT-3.5 models have demonstrated impressive performance in various N...
research
07/08/2023

Is ChatGPT a Good Personality Recognizer? A Preliminary Study

In recent years, personality has been regarded as a valuable personal fa...
research
06/29/2023

A negation detection assessment of GPTs: analysis with the xNot360 dataset

Negation is a fundamental aspect of natural language, playing a critical...
research
05/23/2023

ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding

We introduce ZeroSCROLLS, a zero-shot benchmark for natural language und...
research
06/19/2023

Adversarial Robustness of Prompt-based Few-Shot Learning for Natural Language Understanding

State-of-the-art few-shot learning (FSL) methods leverage prompt-based f...
research
05/27/2023

What indeed can GPT models do in chemistry? A comprehensive benchmark on eight tasks

Large Language Models (LLMs) with strong abilities in natural language p...
research
03/14/2023

Exploring ChatGPT's Ability to Rank Content: A Preliminary Study on Consistency with Human Preferences

As a natural language assistant, ChatGPT is capable of performing variou...

Please sign up or login with your details

Forgot password? Click here to reset