Assessing the nature of large language models: A caution against anthropocentrism

by   Ann Speed, et al.

Generative AI models garnered a large amount of public attention and speculation with the release of OpenAIs chatbot, ChatGPT. At least two opinion camps exist: one excited about possibilities these models offer for fundamental changes to human tasks, and another highly concerned about power these models seem to have. To address these concerns, we assessed GPT3.5 using standard, normed, and validated cognitive and personality measures. For this seedling project, we developed a battery of tests that allowed us to estimate the boundaries of some of these models capabilities, how stable those capabilities are over a short period of time, and how they compare to humans. Our results indicate that GPT 3.5 is unlikely to have developed sentience, although its ability to respond to personality inventories is interesting. It did display large variability in both cognitive and personality measures over repeated observations, which is not expected if it had a human-like personality. Variability notwithstanding, GPT3.5 displays what in a human would be considered poor mental health, including low self-esteem and marked dissociation from reality despite upbeat and helpful responses.


page 12

page 13

page 14

page 15

page 18

page 19


LLM Cognitive Judgements Differ From Human

Large Language Models (LLMs) have lately been on the spotlight of resear...

COKE: A Cognitive Knowledge Graph for Machine Theory of Mind

Theory of mind (ToM) refers to humans' ability to understand and infer t...

Capturing Humans' Mental Models of AI: An Item Response Theory Approach

Improving our understanding of how humans perceive AI teammates is an im...

Have Large Language Models Developed a Personality?: Applicability of Self-Assessment Tests in Measuring Personality in LLMs

Have Large Language Models (LLMs) developed a personality? The short ans...

Evaluation and Analysis of Hallucination in Large Vision-Language Models

Large Vision-Language Models (LVLMs) have recently achieved remarkable s...

Development and Evaluation of Three Chatbots for Postpartum Mood and Anxiety Disorders

In collaboration with Postpartum Support International (PSI), a non-prof...

Modeling cognitive load as a self-supervised brain rate with electroencephalography and deep learning

The principal reason for measuring mental workload is to quantify the co...

Please sign up or login with your details

Forgot password? Click here to reset