GPT as Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities

01/11/2023
by   Jillian Bommarito, et al.
0

The global economy is increasingly dependent on knowledge workers to meet the needs of public and private organizations. While there is no single definition of knowledge work, organizations and industry groups still attempt to measure individuals' capability to engage in it. The most comprehensive assessment of capability readiness for professional knowledge workers is the Uniform CPA Examination developed by the American Institute of Certified Public Accountants (AICPA). In this paper, we experimentally evaluate OpenAI's `text-davinci-003` and prior versions of GPT on both a sample Regulation (REG) exam and an assessment of over 200 multiple-choice questions based on the AICPA Blueprints for legal, financial, accounting, technology, and ethical tasks. First, we find that `text-davinci-003` achieves a correct rate of 14.4 section, significantly underperforming human capabilities on quantitative reasoning in zero-shot prompts. Second, `text-davinci-003` appears to be approaching human-level performance on the Remembering Understanding and Application skill levels in the Exam absent calculation. For best prompt and parameters, the model answers 57.6 better than the 25 the time, indicating strong non-entailment. Finally, we find that recent generations of GPT-3 demonstrate material improvements on this assessment, rising from 30 findings strongly suggest that large language models have the potential to transform the quality and efficiency of future knowledge work.

READ FULL TEXT
research
05/30/2022

Billions of Parameters Are Worth More Than In-domain Training Data: A case study in the Legal Case Entailment Task

Recent work has shown that language models scaled to billions of paramet...
research
12/29/2022

GPT Takes the Bar Exam

Nearly all jurisdictions in the United States require a professional lic...
research
12/02/2022

Legal Prompting: Teaching a Language Model to Think Like a Lawyer

Large language models that are capable of zero or few-shot prompting app...
research
07/26/2023

Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data

Advances in large language models (LLMs) have empowered a variety of app...
research
05/17/2023

Statistical Knowledge Assessment for Generative Language Models

Generative Language Models (GLMs) have demonstrated capabilities to stor...
research
05/10/2023

Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks

The most recent large language models such as ChatGPT and GPT-4 have gar...
research
02/13/2023

Can GPT-3 Perform Statutory Reasoning?

Statutory reasoning is the task of reasoning with facts and statutes, wh...

Please sign up or login with your details

Forgot password? Click here to reset