Questioning the Survey Responses of Large Language Models

06/13/2023
by   Ricardo Dominguez-Olmedo, et al.
0

As large language models increase in capability, researchers have started to conduct surveys of all kinds on these models with varying scientific motivations. In this work, we examine what we can learn from a model's survey responses on the basis of the well-established American Community Survey (ACS) by the U.S. Census Bureau. Evaluating more than a dozen different models, varying in size from a few hundred million to ten billion parameters, hundreds of thousands of times each on questions from the ACS, we systematically establish two dominant patterns. First, smaller models have a significant position and labeling bias, for example, towards survey responses labeled with the letter "A". This A-bias diminishes, albeit slowly, as model size increases. Second, when adjusting for this labeling bias through randomized answer ordering, models still do not trend toward US population statistics or those of any cognizable population. Rather, models across the board trend toward uniformly random aggregate statistics over survey responses. This pattern is robust to various different ways of prompting the model, including what is the de-facto standard. Our findings demonstrate that aggregate statistics of a language model's survey responses lack the signals found in human populations. This absence of statistical signal cautions about the use of survey responses from large language models at present time.

READ FULL TEXT
research
03/20/2023

Language Model Behavior: A Comprehensive Survey

Transformer language models have received widespread public attention, y...
research
08/18/2022

Using Large Language Models to Simulate Multiple Humans

We propose a method for using a large language model, such as GPT-3, to ...
research
06/28/2023

Towards Measuring the Representation of Subjective Global Opinions in Language Models

Large language models (LLMs) may not equitably represent diverse global ...
research
06/15/2023

Safeguarding Crowdsourcing Surveys from ChatGPT with Prompt Injection

ChatGPT and other large language models (LLMs) have proven useful in cro...
research
08/26/2022

What Do NLP Researchers Believe? Results of the NLP Community Metasurvey

We present the results of the NLP Community Metasurvey. Run from May to ...
research
07/16/2021

Intersectional Bias in Causal Language Models

To examine whether intersectional bias can be observed in language gener...
research
08/01/2022

Calculating incidence of Influenza-like and COVID-like symptoms from Flutracking participatory survey data

This article describes a new method for estimating weekly incidence (new...

Please sign up or login with your details

Forgot password? Click here to reset