Predicting Multiple ICD-10 Codes from Brazilian-Portuguese Clinical Notes

by   Arthur D. Reys, et al.

ICD coding from electronic clinical records is a manual, time-consuming and expensive process. Code assignment is, however, an important task for billing purposes and database organization. While many works have studied the problem of automated ICD coding from free text using machine learning techniques, most use records in the English language, especially from the MIMIC-III public dataset. This work presents results for a dataset with Brazilian Portuguese clinical notes. We develop and optimize a Logistic Regression model, a Convolutional Neural Network (CNN), a Gated Recurrent Unit Neural Network and a CNN with Attention (CNN-Att) for prediction of diagnosis ICD codes. We also report our results for the MIMIC-III dataset, which outperform previous work among models of the same families, as well as the state of the art. Compared to MIMIC-III, the Brazilian Portuguese dataset contains far fewer words per document, when only discharge summaries are used. We experiment concatenating additional documents available in this dataset, achieving a great boost in performance. The CNN-Att model achieves the best results on both datasets, with micro-averaged F1 score of 0.537 on MIMIC-III and 0.485 on our dataset with additional documents.


page 1

page 2

page 3

page 4


Experimental Evaluation and Development of a Silver-Standard for the MIMIC-III Clinical Coding Dataset

Clinical coding is currently a labour-intensive, error-prone, but critic...

Medical Code Prediction from Discharge Summary: Document to Sequence BERT using Sequence Attention

Clinical notes are unstructured text generated by clinicians during pati...

Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study

Medical coding is the task of assigning medical codes to clinical free-t...

ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network

Automated ICD coding, which assigns the International Classification of ...

An Explainable CNN Approach for Medical Codes Prediction from Clinical Text

Method: We develop CNN-based methods for automatic ICD coding based on c...

Autoencoder-based prediction of ICU clinical codes

Availability of diagnostic codes in Electronic Health Records (EHRs) is ...

Description-based Label Attention Classifier for Explainable ICD-9 Classification

ICD-9 coding is a relevant clinical billing task, where unstructured tex...

Please sign up or login with your details

Forgot password? Click here to reset