GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization

05/21/2018
by   Hung-I Harry Chen, et al.
0

Bioinformatics tools have been developed to interpret gene expression data at the gene set level, and these gene set based analyses improve the biologists' capability to discover functional relevance of their experiment design. While elucidating gene set individually, inter gene sets association is rarely taken into consideration. Deep learning, an emerging machine learning technique in computational biology, can be used to generate an unbiased combination of gene set, and to determine the biological relevance and analysis consistency of these combining gene sets by leveraging large genomic data sets. In this study, we proposed a gene superset autoencoder (GSAE), a multi-layer autoencoder model with the incorporation of a priori defined gene sets that retain the crucial biological features in the latent layer. We introduced the concept of the gene superset, an unbiased combination of gene sets with weights trained by the autoencoder, where each node in the latent layer is a superset. Trained with genomic data from TCGA and evaluated with their accompanying clinical parameters, we showed gene supersets' ability of discriminating tumor subtypes and their prognostic capability. We further demonstrated the biological relevance of the top component gene sets in the significant supersets. Using autoencoder model and gene superset at its latent layer, we demonstrated that gene supersets retain sufficient biological information with respect to tumor subtypes and clinical prognostic significance. Superset also provides high reproducibility on survival analysis and accurate prediction for cancer subtypes.

READ FULL TEXT

page 1

page 5

page 8

page 9

page 11

page 23

page 36

page 37

research
01/30/2019

Distinguishing between Normal and Cancer Cells Using Autoencoder Node Saliency

Gene expression profiles have been widely used to characterize patterns ...
research
09/07/2023

Evaluation of large language models for discovery of gene set function

Gene set analysis is a mainstay of functional genomics, but it relies on...
research
01/09/2018

GIFT: Guided and Interpretable Factorization for Tensors - An Application to Large-Scale Multi-platform Cancer Analysis

Given multi-platform genome data with prior knowledge of functional gene...
research
06/18/2019

Learning data representation using modified autoencoder for the integrative analysis of multi-omics data

In integrative analyses of omics data, it is often of interest to extrac...
research
07/30/2023

Redundancy-aware unsupervised rankings for collections of gene sets

The biological roles of gene sets are used to group them into collection...
research
12/08/2020

An Enhanced MA Plot with R-Shiny to Ease Exploratory Analysis of Transcriptomic Data

MA plots are used to analyze the genome-wide differences in gene express...
research
06/10/2022

Graph-in-Graph Network for Automatic Gene Ontology Description Generation

Gene Ontology (GO) is the primary gene function knowledge base that enab...

Please sign up or login with your details

Forgot password? Click here to reset