A survey of automatic de-identification of longitudinal clinical narratives

10/16/2018
by   Vithya Yogarajan, et al.
0

Use of medical data, also known as electronic health records, in research helps develop and advance medical science. However, protecting patient confidentiality and identity while using medical data for analysis is crucial. Medical data can be in the form of tabular structures (i.e. tables), free-form narratives, and images. This study focuses on medical data in the free form longitudinal text. De-identification of electronic health records provides the opportunity to use such data for research without it affecting patient privacy, and avoids the need for individual patient consent. In recent years there is increasing interest in developing an accurate, robust and adaptable automatic de-identification system for electronic health records. This is mainly due to the dilemma between the availability of an abundance of health data, and the inability to use such data in research due to legal and ethical restrictions. De-identification tracks in competitions such as the 2014 i2b2 UTHealth and the 2016 CEGS N-GRID shared tasks have provided a great platform to advance this area. The primary reasons for this include the open source nature of the dataset and the fact that raw psychiatric data were used for 2016 competitions. This study focuses on noticeable trend changes in the techniques used in the development of automatic de-identification for longitudinal clinical narratives. More specifically, the shift from using conditional random fields (CRF) based systems only or rules (regular expressions, dictionary or combinations) based systems only, to hybrid models (combining CRF and rules), and more recently to deep learning based systems. We review the literature and results that arose from the 2014 and the 2016 competitions and discuss the outcomes of these systems. We also provide a list of research questions that emerged from this survey.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2019

Adversarial Learning of Privacy-Preserving Text Representations for De-Identification of Medical Records

De-identification is the task of detecting protected health information ...
research
01/27/2019

Automatic end-to-end De-identification: Is high accuracy the only metric?

De-identification of electronic health records (EHR) is a vital step tow...
research
06/17/2019

Scrubbing Sensitive PHI Data from Medical Records made Easy by SpaCy -- A Scalable Model Implementation Comparisons

De-identification of clinical records is an extremely important process ...
research
08/23/2021

Medical Graphs in Patient Information Systems in Primary Care

Graphs are very effective tools in visualizing information and are used ...
research
05/09/2023

Spatial Computing Opportunities in Biomedical Decision Support: The Atlas-EHR Vision

Consider the problem of reducing the time needed by healthcare professio...
research
12/13/2022

Foresight – Deep Generative Modelling of Patient Timelines using Electronic Health Records

Electronic Health Records (EHRs) hold detailed longitudinal information ...
research
04/14/2023

Federated and distributed learning applications for electronic health records and structured medical data: A scoping review

Federated learning (FL) has gained popularity in clinical research in re...

Please sign up or login with your details

Forgot password? Click here to reset