Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes

04/24/2019
by   Yujia Bao, et al.
0

PURPOSE: The medical literature relevant to germline genetics is growing exponentially. Clinicians need tools monitoring and prioritizing the literature to understand the clinical implications of the pathogenic genetic variants. We developed and evaluated two machine learning models to classify abstracts as relevant to the penetrance (risk of cancer for germline mutation carriers) or prevalence of germline genetic mutations. METHODS: We conducted literature searches in PubMed and retrieved paper titles and abstracts to create an annotated dataset for training and evaluating the two machine learning classification models. Our first model is a support vector machine (SVM) which learns a linear decision rule based on the bag-of-ngrams representation of each title and abstract. Our second model is a convolutional neural network (CNN) which learns a complex nonlinear decision rule based on the raw title and abstract. We evaluated the performance of the two models on the classification of papers as relevant to penetrance or prevalence. RESULTS: For penetrance classification, we annotated 3740 paper titles and abstracts and used 60 training the model, 20 The SVM model achieves 89.53 correctly classified) while the CNN model achieves 88.95 prevalence classification, we annotated 3753 paper titles and abstracts. The SVM model achieves 89.14 accuracy. CONCLUSION: Our models achieve high accuracy in classifying abstracts as relevant to penetrance or prevalence. By facilitating literature review, this tool could help clinicians and researchers keep abreast of the burgeoning knowledge of gene-cancer associations and keep the knowledge bases for clinical decision support tools up to date.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2020

Classifying Malware Images with Convolutional Neural Network Models

Due to increasing threats from malicious software (malware) in both numb...
research
06/27/2019

Training Models to Extract Treatment Plans from Clinical Notes Using Contents of Sections with Headings

Objective: Using natural language processing (NLP) to find sentences tha...
research
04/10/2018

Deep Learning for Digital Text Analytics: Sentiment Analysis

In today's scenario, imagining a world without negativity is something v...
research
03/06/2022

A SVM Model for Candidate Y-chromosome Gene Discovery in Prostate Cancer

Prostate cancer is widely known to be one of the most common cancers amo...
research
04/20/2018

A Deep Representation Empowered Distant Supervision Paradigm for Clinical Information Extraction

Objective: To automatically create large labeled training datasets and r...
research
08/16/2023

Large Language Models for Granularized Barrett's Esophagus Diagnosis Classification

Diagnostic codes for Barrett's esophagus (BE), a precursor to esophageal...
research
07/18/2023

Detecting Throat Cancer from Speech Signals Using Machine Learning: A Reproducible Literature Review

In this work we perform a scoping review of the current literature on th...

Please sign up or login with your details

Forgot password? Click here to reset