De-identification of Patient Notes with Recurrent Neural Networks

by   Franck Dernoncourt, et al.

Objective: Patient notes in electronic health records (EHRs) may contain critical information for medical investigations. However, the vast majority of medical investigators can only access de-identified notes, in order to protect the confidentiality of patients. In the United States, the Health Insurance Portability and Accountability Act (HIPAA) defines 18 types of protected health information (PHI) that needs to be removed to de-identify patient notes. Manual de-identification is impractical given the size of EHR databases, the limited number of researchers with access to the non-de-identified notes, and the frequent mistakes of human annotators. A reliable automated de-identification system would consequently be of high value. Materials and Methods: We introduce the first de-identification system based on artificial neural networks (ANNs), which requires no handcrafted features or rules, unlike existing systems. We compare the performance of the system with state-of-the-art systems on two datasets: the i2b2 2014 de-identification challenge dataset, which is the largest publicly available de-identification dataset, and the MIMIC de-identification dataset, which we assembled and is twice as large as the i2b2 2014 dataset. Results: Our ANN model outperforms the state-of-the-art systems. It yields an F1-score of 97.85 on the i2b2 2014 dataset, with a recall 97.38 and a precision of 97.32, and an F1-score of 99.23 on the MIMIC de-identification dataset, with a recall 99.25 and a precision of 99.06. Conclusion: Our findings support the use of ANNs for de-identification of patient notes, as they show better performance than previously published systems while requiring no feature engineering.


page 8

page 10


Feature-Augmented Neural Networks for Patient Note De-identification

Patient notes contain a wealth of information of potentially great inter...

ScAN: Suicide Attempt and Ideation Events Dataset

Suicide is an important public health concern and one of the leading cau...

A Comparative Evaluation Of Transformer Models For De-Identification Of Clinical Text Data

Objective: To comparatively evaluate several transformer model architect...

Health Data in an Open World

With the aim of informing sound policy about data sharing and privacy, w...

Ranking medical jargon in electronic health record notes by adapted distant supervision

Objective: Allowing patients to access their own electronic health recor...

Assessing the risk of re-identification arising from an attack on anonymised data

Objective: The use of routinely-acquired medical data for research purpo...

Performance of Automatic De-identification Across Different Note Types

Free-text clinical notes detail all aspects of patient care and have gre...

Please sign up or login with your details

Forgot password? Click here to reset