Ranking medical jargon in electronic health record notes by adapted distant supervision

by   Jinying Chen, et al.

Objective: Allowing patients to access their own electronic health record (EHR) notes through online patient portals has the potential to improve patient-centered care. However, medical jargon, which abounds in EHR notes, has been shown to be a barrier for patient EHR comprehension. Existing knowledge bases that link medical jargon to lay terms or definitions play an important role in alleviating this problem but have low coverage of medical jargon in EHRs. We developed a data-driven approach that mines EHRs to identify and rank medical jargon based on its importance to patients, to support the building of EHR-centric lay language resources. Methods: We developed an innovative adapted distant supervision (ADS) model based on support vector machines to rank medical jargon from EHRs. For distant supervision, we utilized the open-access, collaborative consumer health vocabulary, a large, publicly available resource that links lay terms to medical jargon. We explored both knowledge-based features from the Unified Medical Language System and distributed word representations learned from unlabeled large corpora. We evaluated the ADS model using physician-identified important medical terms. Results: Our ADS model significantly surpassed two state-of-the-art automatic term recognition methods, TF*IDF and C-Value, yielding 0.810 ROC-AUC versus 0.710 and 0.667, respectively. Our model identified 10K important medical jargon terms after ranking over 100K candidate terms mined from over 7,500 EHR narratives. Conclusion: Our work is an important step towards enriching lexical resources that link medical jargon to lay terms/definitions to support patient EHR comprehension. The identified medical jargon terms and their rankings are available upon request.


page 1

page 2

page 3

page 4


Unsupervised Ensemble Ranking of Terms in Electronic Health Record Notes Based on Their Importance to Patients

Background: Electronic health record (EHR) notes contain abundant medica...

De-identification of Patient Notes with Recurrent Neural Networks

Objective: Patient notes in electronic health records (EHRs) may contain...

Towards more patient friendly clinical notes through language models and ontologies

Clinical notes are an efficient way to record patient information but ar...

Learning to Write Notes in Electronic Health Records

Clinicians spend a significant amount of time inputting free-form textua...

TAPER: Time-Aware Patient EHR Representation

Effective representation learning of electronic health records is a chal...

MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and Contextualized Masked Language Model Score

This paper proposes a new natural language processing (NLP) application ...

Enriching Consumer Health Vocabulary Using Enhanced GloVe Word Embedding

Open-Access and Collaborative Consumer Health Vocabulary (OAC CHV, or CH...

Please sign up or login with your details

Forgot password? Click here to reset