Finding Stakeholder-Material Information from 10-K Reports using Fine-Tuned BERT and LSTM Models

08/15/2023
by   Victor Zitian Chen, et al.
0

All public companies are required by federal securities law to disclose their business and financial activities in their annual 10-K reports. Each report typically spans hundreds of pages, making it difficult for human readers to identify and extract the material information efficiently. To solve the problem, I have fine-tuned BERT models and RNN models with LSTM layers to identify stakeholder-material information, defined as statements that carry information about a company's influence on its stakeholders, including customers, employees, investors, and the community and natural environment. The existing practice uses keyword search to identify such information, which is my baseline model. Using business expert-labeled training data of nearly 6,000 sentences from 62 10-K reports published in 2022, the best model has achieved an accuracy of 0.904 and an F1 score of 0.899 in test data, significantly above the baseline model's 0.781 and 0.749 respectively. Furthermore, the same work was replicated on more granular taxonomies, based on which four distinct groups of stakeholders (i.e., customers, investors, employees, and the community and natural environment) are tested separately. Similarly, fined-tuned BERT models outperformed LSTM and the baseline. The implications for industry application and ideas for future extensions are discussed.

READ FULL TEXT

page 4

page 6

page 7

research
02/14/2022

Punctuation restoration in Swedish through fine-tuned KB-BERT

Presented here is a method for automatic punctuation restoration in Swed...
research
03/11/2022

verBERT: Automating Brazilian Case Law Document Multi-label Categorization Using BERT

In this work, we carried out a study about the use of attention-based al...
research
05/01/2020

When BERT Plays the Lottery, All Tickets Are Winning

Much of the recent success in NLP is due to the large Transformer-based ...
research
05/03/2022

Predicting Issue Types with seBERT

Pre-trained transformer models are the current state-of-the-art for natu...
research
01/31/2023

TopoBERT: Plug and Play Toponym Recognition Module Harnessing Fine-tuned BERT

Extracting precise geographical information from textual contents is cru...
research
12/07/2022

TweetDrought: A Deep-Learning Drought Impacts Recognizer based on Twitter Data

Acquiring a better understanding of drought impacts becomes increasingly...
research
09/18/2023

Automatic Personalized Impression Generation for PET Reports Using Large Language Models

Purpose: To determine if fine-tuned large language models (LLMs) can gen...

Please sign up or login with your details

Forgot password? Click here to reset