Inferring Disease and Gene Set Associations with Rank Coherence in Networks

by   TaeHyun Hwang, et al.

A computational challenge to validate the candidate disease genes identified in a high-throughput genomic study is to elucidate the associations between the set of candidate genes and disease phenotypes. The conventional gene set enrichment analysis often fails to reveal associations between disease phenotypes and the gene sets with a short list of poorly annotated genes, because the existing annotations of disease causative genes are incomplete. We propose a network-based computational approach called rcNet to discover the associations between gene sets and disease phenotypes. Assuming coherent associations between the genes ranked by their relevance to the query gene set, and the disease phenotypes ranked by their relevance to the hidden target disease phenotypes of the query gene set, we formulate a learning framework maximizing the rank coherence with respect to the known disease phenotype-gene associations. An efficient algorithm coupling ridge regression with label propagation, and two variants are introduced to find the optimal solution of the framework. We evaluated the rcNet algorithms and existing baseline methods with both leave-one-out cross-validation and a task of predicting recently discovered disease-gene associations in OMIM. The experiments demonstrated that the rcNet algorithms achieved the best overall rankings compared to the baselines. To further validate the reproducibility of the performance, we applied the algorithms to identify the target diseases of novel candidate disease genes obtained from recent studies of GWAS, DNA copy number variation analysis, and gene expression profiling. The algorithms ranked the target disease of the candidate genes at the top of the rank list in many cases across all the three case studies. The rcNet algorithms are available as a webtool for disease and gene set association analysis at


Predicting Disease-Gene Associations using Cross-Document Graph-based Features

In the context of personalized medicine, text mining methods pose an int...

Recent Advances in Network-based Methods for Disease Gene Prediction

Disease-gene association through Genome-wide association study (GWAS) is...

On a Possible Similarity between Gene and Semantic Networks

In several domains such as linguistics, molecular biology or social scie...

Disease gene prioritization using network topological analysis from a sequence based human functional linkage network

Sequencing large number of candidate disease genes which cause diseases ...

Jumping across biomedical contexts using compressive data fusion

Motivation: The rapid growth of diverse biological data allows us to con...

Logistic Regression Augmented Community Detection for Network Data with Application in Identifying Autism-Related Gene Pathways

When searching for gene pathways leading to specific disease outcomes, a...

Building a Relation Extraction Baseline for Gene-Disease Associations: A Reproducibility Study

Reproducibility is an important task in scientific research. It is cruci...

Please sign up or login with your details

Forgot password? Click here to reset