Soft-prompt tuning to predict lung cancer using primary care free-text Dutch medical notes

03/28/2023
by   Auke Elfrink, et al.
0

We investigate different natural language processing (NLP) approaches based on contextualised word representations for the problem of early prediction of lung cancer using free-text patient medical notes of Dutch primary care physicians. Because lung cancer has a low prevalence in primary care, we also address the problem of classification under highly imbalanced classes. Specifically, we use large Transformer-based pretrained language models (PLMs) and investigate: 1) how soft prompt-tuning – an NLP technique used to adapt PLMs using small amounts of training data – compares to standard model fine-tuning; 2) whether simpler static word embedding models (WEMs) can be more robust compared to PLMs in highly imbalanced settings; and 3) how models fare when trained on notes from a small number of patients. We find that 1) soft-prompt tuning is an efficient alternative to standard model fine-tuning; 2) PLMs show better discrimination but worse calibration compared to simpler static word embedding models as the classification problem becomes more imbalanced; and 3) results when training models on small number of patients are mixed and show no clear differences between PLMs and WEMs. All our code is available open source in <https://bitbucket.org/aumc-kik/prompt_tuning_cancer_prediction/>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/28/2022

Natural Language Processing Methods to Identify Oncology Patients at High Risk for Acute Care with Clinical Notes

Clinical notes are an essential component of a health record. This paper...
research
03/24/2023

Natural language processing to automatically extract the presence and severity of esophagitis in notes of patients undergoing radiotherapy

Radiotherapy (RT) toxicities can impair survival and quality-of-life, ye...
research
10/17/2022

Using Bottleneck Adapters to Identify Cancer in Clinical Notes under Low-Resource Constraints

Processing information locked within clinical health records is a challe...
research
02/15/2020

Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping

Fine-tuning pretrained contextual word embedding models to supervised do...
research
04/14/2021

Towards BERT-based Automatic ICD Coding: Limitations and Opportunities

Automatic ICD coding is the task of assigning codes from the Internation...

Please sign up or login with your details

Forgot password? Click here to reset