A Multi-View Joint Learning Framework for Embedding Clinical Codes and Text Using Graph Neural Networks

by   Lecheng Kong, et al.

Learning to represent free text is a core task in many clinical machine learning (ML) applications, as clinical text contains observations and plans not otherwise available for inference. State-of-the-art methods use large language models developed with immense computational resources and training data; however, applying these models is challenging because of the highly varying syntax and vocabulary in clinical free text. Structured information such as International Classification of Disease (ICD) codes often succinctly abstracts the most important facts of a clinical encounter and yields good performance, but is often not as available as clinical text in real-world scenarios. We propose a multi-view learning framework that jointly learns from codes and text to combine the availability and forward-looking nature of text and better performance of ICD codes. The learned text embeddings can be used as inputs to predictive algorithms independent of the ICD codes during inference. Our approach uses a Graph Neural Network (GNN) to process ICD codes, and Bi-LSTM to process text. We apply Deep Canonical Correlation Analysis (DCCA) to enforce the two views to learn a similar representation of each patient. In experiments using planned surgical procedure text, our model outperforms BERT models fine-tuned to clinical data, and in experiments using diverse text in MIMIC-III, our model is competitive to a fine-tuned BERT at a tiny fraction of its computational effort.


page 1

page 2

page 3

page 4


Neural Language Models with Distant Supervision to Identify Major Depressive Disorder from Clinical Notes

Major depressive disorder (MDD) is a prevalent psychiatric disorder that...

Prediction of ICD Codes with Clinical BERT Embeddings and Text Augmentation with Label Balancing using MIMIC-III

This paper achieves state of the art results for the ICD code prediction...

Large Language Models to Identify Social Determinants of Health in Electronic Health Records

Social determinants of health (SDoH) have an important impact on patient...

Artificial Interrogation for Attributing Language Models

This paper presents solutions to the Machine Learning Model Attribution ...

GrabQC: Graph based Query Contextualization for automated ICD coding

Automated medical coding is a process of codifying clinical notes to app...

ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network

Automated ICD coding, which assigns the International Classification of ...

Autoencoder-based prediction of ICU clinical codes

Availability of diagnostic codes in Electronic Health Records (EHRs) is ...

Please sign up or login with your details

Forgot password? Click here to reset