Predicting Clinical Diagnosis from Patients Electronic Health Records Using BERT-based Neural Networks

by   Pavel Blinov, et al.

In this paper we study the problem of predicting clinical diagnoses from textual Electronic Health Records (EHR) data. We show the importance of this problem in medical community and present comprehensive historical review of the problem and proposed methods. As the main scientific contributions we present a modification of Bidirectional Encoder Representations from Transformers (BERT) model for sequence classification that implements a novel way of Fully-Connected (FC) layer composition and a BERT model pretrained only on domain data. To empirically validate our model, we use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits. This is the largest such study for the Russian language and one of the largest globally. We performed a number of comparative experiments with other text representation models on the task of multiclass classification for 265 disease subset of ICD-10. The experiments demonstrate improved performance of our models compared to other baselines, including a fine-tuned Russian BERT (RuBERT) variant. We also show comparable performance of our model with a panel of experienced medical experts. This allows us to hope that implementation of this system will reduce misdiagnosis.


page 1

page 2

page 3

page 4


Enhancing the prediction of disease outcomes using electronic health records and pretrained deep learning models

Question: Can an encoder-decoder architecture pretrained on a large data...

CPLLM: Clinical Prediction with Large Language Models

We present Clinical Prediction with Large Language Models (CPLLM), a met...

Med-BERT: pre-trained contextualized embeddings on large-scale structured electronic health records for disease prediction

Deep learning (DL) based predictive models from electronic health record...

Pre-training of Graph Augmented Transformers for Medication Recommendation

Medication recommendation is an important healthcare application. It is ...

Medical SANSformers: Training self-supervised transformers without attention for Electronic Medical Records

We leverage deep sequential models to tackle the problem of predicting h...

Does BERT Pretrained on Clinical Notes Reveal Sensitive Data?

Large Transformers pretrained over clinical notes from Electronic Health...

sEHR-CE: Language modelling of structured EHR data for efficient and generalizable patient cohort expansion

Electronic health records (EHR) offer unprecedented opportunities for in...

Please sign up or login with your details

Forgot password? Click here to reset