Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT

08/11/2023
by   Jingye Yang, et al.
0

We hypothesize that large language models (LLMs) based on the transformer architecture can enable automated detection of clinical phenotype terms, including terms not documented in the HPO. In this study, we developed two types of models: PhenoBCBERT, a BERT-based model, utilizing Bio+Clinical BERT as its pre-trained model, and PhenoGPT, a GPT-based model that can be initialized from diverse GPT models, including open-source versions such as GPT-J, Falcon, and LLaMA, as well as closed-source versions such as GPT-3 and GPT-3.5. We compared our methods with PhenoTagger, a recently developed HPO recognition tool that combines rule-based and deep learning methods. We found that our methods can extract more phenotype concepts, including novel ones not characterized by HPO. We also performed case studies on biomedical literature to illustrate how new phenotype information can be recognized and extracted. We compared current BERT-based versus GPT-based models for phenotype tagging, in multiple aspects including model architecture, memory usage, speed, accuracy, and privacy protection. We also discussed the addition of a negation step and an HPO normalization layer to the transformer models for improved HPO term tagging. In conclusion, PhenoBCBERT and PhenoGPT enable the automated discovery of phenotype terms from clinical notes and biomedical literature, facilitating automated downstream tasks to derive new biological insights on human diseases.

READ FULL TEXT

page 16

page 23

page 24

page 25

page 26

page 28

page 30

page 31

research
09/01/2023

Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes

The development of large language models tailored for handling patients'...
research
05/23/2022

ScholarBERT: Bigger is Not Always Better

Transformer-based masked language models trained on general corpora, suc...
research
10/12/2022

Developing a general-purpose clinical language inference model from a large corpus of clinical notes

Several biomedical language models have already been developed for clini...
research
03/22/2023

A Small-Scale Switch Transformer and NLP-based Model for Clinical Narratives Classification

In recent years, Transformer-based models such as the Switch Transformer...
research
03/06/2018

CliNER 2.0: Accessible and Accurate Clinical Concept Extraction

Clinical notes often describe important aspects of a patient's stay and ...
research
07/01/2023

Hierarchical Pretraining for Biomedical Term Embeddings

Electronic health records (EHR) contain narrative notes that provide ext...
research
01/12/2023

Inaccessible Neural Language Models Could Reinvigorate Linguistic Nativism

Large Language Models (LLMs) have been making big waves in the machine l...

Please sign up or login with your details

Forgot password? Click here to reset