Unsupervised Extraction of Phenotypes from Cancer Clinical Notes for Association Studies

04/29/2019
by   Stefan G. Stark, et al.
0

The recent adoption of Electronic Health Records (EHRs) by health care providers has introduced an important source of data that provides detailed and highly specific insights into patient phenotypes over large cohorts. These datasets, in combination with machine learning and statistical approaches, generate new opportunities for research and clinical care. However, many methods require the patient representations to be in structured formats, while the information in the EHR is often locked in unstructured texts designed for human readability. In this work, we develop the methodology to automatically extract clinical features from clinical narratives from large EHR corpora without the need for prior knowledge. We consider medical terms and sentences appearing in clinical narratives as atomic information units. We propose an efficient clustering strategy suitable for the analysis of large text corpora and to utilize the clusters to represent information about the patient compactly. To demonstrate the utility of our approach, we perform an association study of clinical features with somatic mutation profiles from 4,007 cancer patients and their tumors. We apply the proposed algorithm to a dataset consisting of about 65 thousand documents with a total of about 3.2 million sentences. We identify 341 significant statistical associations between the presence of somatic mutations and clinical features. We annotated these associations according to their novelty, and report several known associations. We also propose 32 testable hypotheses where the underlying biological mechanism does not appear to be known but plausible. These results illustrate that the automated discovery of clinical features is possible and the joint analysis of clinical and genetic datasets can generate appealing new hypotheses.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/13/2015

How essential are unstructured clinical narratives and information fusion to clinical trial recruitment?

Electronic health records capture patient information using structured c...
research
06/21/2019

SurfCon: Synonym Discovery on Privacy-Aware Clinical Data

Unstructured clinical texts contain rich health-related information. To ...
research
02/01/2023

Comprehensive and user-analytics-friendly cancer patient database for physicians and researchers

Nuanced cancer patient care is needed, as the development and clinical c...
research
11/17/2013

Towards a New Science of a Clinical Data Intelligence

In this paper we define Clinical Data Intelligence as the analysis of da...
research
03/09/2020

Towards Patient Record Summarization Through Joint Phenotype Learning in HIV Patients

Identifying a patient's key problems over time is a common task for prov...

Please sign up or login with your details

Forgot password? Click here to reset