Soft-prompt tuning to predict lung cancer using primary care free-text Dutch medical notes

by   Auke Elfrink, et al.

We investigate different natural language processing (NLP) approaches based on contextualised word representations for the problem of early prediction of lung cancer using free-text patient medical notes of Dutch primary care physicians. Because lung cancer has a low prevalence in primary care, we also address the problem of classification under highly imbalanced classes. Specifically, we use large Transformer-based pretrained language models (PLMs) and investigate: 1) how soft prompt-tuning – an NLP technique used to adapt PLMs using small amounts of training data – compares to standard model fine-tuning; 2) whether simpler static word embedding models (WEMs) can be more robust compared to PLMs in highly imbalanced settings; and 3) how models fare when trained on notes from a small number of patients. We find that 1) soft-prompt tuning is an efficient alternative to standard model fine-tuning; 2) PLMs show better discrimination but worse calibration compared to simpler static word embedding models as the classification problem becomes more imbalanced; and 3) results when training models on small number of patients are mixed and show no clear differences between PLMs and WEMs. All our code is available open source in <>.


page 1

page 2

page 3

page 4


Natural Language Processing Methods to Identify Oncology Patients at High Risk for Acute Care with Clinical Notes

Clinical notes are an essential component of a health record. This paper...

Natural language processing to automatically extract the presence and severity of esophagitis in notes of patients undergoing radiotherapy

Radiotherapy (RT) toxicities can impair survival and quality-of-life, ye...

Using Bottleneck Adapters to Identify Cancer in Clinical Notes under Low-Resource Constraints

Processing information locked within clinical health records is a challe...

Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping

Fine-tuning pretrained contextual word embedding models to supervised do...

Towards BERT-based Automatic ICD Coding: Limitations and Opportunities

Automatic ICD coding is the task of assigning codes from the Internation...

Please sign up or login with your details

Forgot password? Click here to reset