Unsupervised pre-training of graph transformers on patient population graphs

07/21/2022
by   Chantal Pellegrini, et al.
24

Pre-training has shown success in different areas of machine learning, such as Computer Vision, Natural Language Processing (NLP), and medical imaging. However, it has not been fully explored for clinical data analysis. An immense amount of clinical records are recorded, but still, data and labels can be scarce for data collected in small hospitals or dealing with rare diseases. In such scenarios, pre-training on a larger set of unlabelled clinical data could improve performance. In this paper, we propose novel unsupervised pre-training techniques designed for heterogeneous, multi-modal clinical data for patient outcome prediction inspired by masked language modeling (MLM), by leveraging graph deep learning over population graphs. To this end, we further propose a graph-transformer-based network, designed to handle heterogeneous clinical data. By combining masking-based pre-training with a transformer-based network, we translate the success of masking-based pre-training in other domains to heterogeneous clinical data. We show the benefit of our pre-training method in a self-supervised and a transfer learning setting, utilizing three medical datasets TADPOLE, MIMIC-III, and a Sepsis Prediction Dataset. We find that our proposed pre-training methods help in modeling the data at a patient and population level and improve performance in different fine-tuning tasks on all datasets.

READ FULL TEXT

page 5

page 14

research
03/23/2022

Unsupervised Pre-Training on Patient Population Graphs for Patient-Level Predictions

Pre-training has shown success in different areas of machine learning, s...
research
08/31/2021

Medical SANSformers: Training self-supervised transformers without attention for Electronic Medical Records

We leverage deep sequential models to tackle the problem of predicting h...
research
12/27/2020

MeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language Understanding Pretraining

One of the biggest challenges that prohibit the use of many current NLP ...
research
02/24/2021

Generalized and Transferable Patient Language Representation for Phenotyping with Limited Data

The paradigm of representation learning through transfer learning has th...
research
02/08/2021

Clinical Outcome Prediction from Admission Notes using Self-Supervised Knowledge Integration

Outcome prediction from clinical text can prevent doctors from overlooki...
research
07/10/2019

LakhNES: Improving multi-instrumental music generation with cross-domain pre-training

We are interested in the task of generating multi-instrumental music sco...

Please sign up or login with your details

Forgot password? Click here to reset