FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models

08/19/2023
by   Liwen Zhang, et al.
0

Large language models (LLMs) have demonstrated exceptional performance in various natural language processing tasks, yet their efficacy in more challenging and domain-specific tasks remains largely unexplored. This paper presents FinEval, a benchmark specifically designed for the financial domain knowledge in the LLMs. FinEval is a collection of high-quality multiple-choice questions covering Finance, Economy, Accounting, and Certificate. It includes 4,661 questions spanning 34 different academic subjects. To ensure a comprehensive model performance evaluation, FinEval employs a range of prompt types, including zero-shot and few-shot prompts, as well as answer-only and chain-of-thought prompts. Evaluating state-of-the-art Chinese and English LLMs on FinEval, the results show that only GPT-4 achieved an accuracy close to 70 in different prompt settings, indicating significant growth potential for LLMs in the financial domain knowledge. Our work offers a more comprehensive financial knowledge evaluation benchmark, utilizing data of mock exams and covering a wide range of evaluated LLMs.

READ FULL TEXT

page 5

page 8

research
05/21/2023

Evaluating the Performance of Large Language Models on GAOKAO Benchmark

Large language models have demonstrated remarkable performance across va...
research
05/23/2023

CGCE: A Chinese Generative Chat Evaluation Benchmark for General and Financial Domains

Generative chat models, such as ChatGPT and GPT-4, have revolutionized n...
research
08/09/2023

Evaluating the Generation Capabilities of Large Chinese Language Models

This paper presents CG-Eval, the first comprehensive evaluation of the g...
research
04/30/2023

Beyond Classification: Financial Reasoning in State-of-the-Art Language Models

Large Language Models (LLMs), consisting of 100 billion or more paramete...
research
02/16/2023

GLUECons: A Generic Benchmark for Learning Under Constraints

Recent research has shown that integrating domain knowledge into deep le...
research
05/23/2023

Is Information Extraction Solved by ChatGPT? An Analysis of Performance, Evaluation Criteria, Robustness and Errors

ChatGPT has stimulated the research boom in the field of large language ...
research
05/10/2023

Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks

The most recent large language models such as ChatGPT and GPT-4 have gar...

Please sign up or login with your details

Forgot password? Click here to reset