Characterizing Diseases from Unstructured Text: A Vocabulary Driven Word2vec Approach

03/01/2016
by   Saurav Ghosh, et al.
0

Traditional disease surveillance can be augmented with a wide variety of real-time sources such as, news and social media. However, these sources are in general unstructured and, construction of surveillance tools such as taxonomical correlations and trace mapping involves considerable human supervision. In this paper, we motivate a disease vocabulary driven word2vec model (Dis2Vec) to model diseases and constituent attributes as word embeddings from the HealthMap news corpus. We use these word embeddings to automatically create disease taxonomies and evaluate our model against corresponding human annotated taxonomies. We compare our model accuracies against several state-of-the art word2vec methods. Our results demonstrate that Dis2Vec outperforms traditional distributed vector representations in its ability to faithfully capture taxonomical attributes across different class of diseases such as endemic, emerging and rare.

READ FULL TEXT

page 1

page 2

page 4

page 8

page 15

research
05/22/2019

Retrieving Multi-Entity Associations: An Evaluation of Combination Modes for Word Embeddings

Word embeddings have gained significant attention as learnable represent...
research
11/16/2017

Deceptiveness of internet data for disease surveillance

Quantifying how many people are or will be sick, and where, is a critica...
research
07/11/2021

Document Embedding for Scientific Articles: Efficacy of Word Embeddings vs TFIDF

Over the last few years, neural network derived word embeddings became p...
research
05/09/2018

Incorporating Subword Information into Matrix Factorization Word Embeddings

The positive effect of adding subword information to word embeddings has...
research
09/24/2019

Deep Text Mining of Instagram Data Without Strong Supervision

With the advent of social media, our online feeds increasingly consist o...
research
01/28/2021

Revisiting Non-Specific Syndromic Surveillance

Infectious disease surveillance is of great importance for the preventio...
research
06/12/2021

BIOPAK Flasher: Epidemic disease monitoring and detection in Pakistan using text mining

Infectious disease outbreak has a significant impact on morbidity, morta...

Please sign up or login with your details

Forgot password? Click here to reset