BMC Systems Biology: GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization

Hung-I Harry ChenYu-Chiao ChiuTinghe ZhangSongyao ZhangYufei Huang & Yidong Chen

Abstract

Background

Bioinformatics tools have been developed to interpret gene expression data at the gene set level, and these gene set-based analyses improve the biologists’ capability to discover the functional relevance of their experiment design. While elucidating gene set individually, inter-gene sets association is rarely taken into consideration. Deep learning, an emerging machine learning technique in computational biology, can be used to generate an unbiased combination of the gene set, and to determine the biological relevance and analysis consistency of these combining gene sets by leveraging large genomic data sets.

Results

In this study, we proposed a gene superset autoencoder (GSAE), a multi-layer autoencoder model with the incorporation of a priori defined gene sets that retain the crucial biological features in the latent layer. We introduced the concept of the gene superset, an unbiased combination of gene sets with weights trained by the autoencoder, where each node in the latent layer is a superset. Trained with genomic data from TCGA and evaluated with their accompanying clinical parameters, we showed gene supersets’ ability to discriminate tumor subtypes and their prognostic capability. We further demonstrated the biological relevance of the top component gene sets in the significant supersets.

Conclusions

Using the autoencoder model and gene superset at its latent layer, we demonstrated that gene supersets retain sufficient biological information with respect to tumor subtypes and clinical prognostic significance. Superset also provides high reproducibility on survival analysis and accurate prediction for cancer subtypes.

Learn More Button

Article Categories: All News, PR Stories, Research Paper

Since 2004, UT Health San Antonio, Greehey Children’s Cancer Research Institute’s (Greehey CCRI) mission has been to advance scientific knowledge relevant to childhood cancer, contribute to understanding its causes, and accelerate the translation of knowledge into novel therapies. Greehey CCRI strives to have a national and global impact on childhood cancer by discovering, developing, and disseminating new scientific knowledge. Our mission consists of three key areas — research, clinical, and education.

Stay connected with the Greehey CCRI on Facebook, Twitter, LinkedIn, and Instagram.