Hui-Mei Tsai1,2,3,12 ∙ Tzu-Hung Hsiao2,4,5,12 ∙ Yu-Chiao Chiu6,7 ∙ Yufei Huang6,7 ∙ ∙
Highlights
• DeepGxP provides a framework to translate transcriptomes into proteomic insight
• DeepEnrich links RNA predictors to functional protein pathways and activities
• DeepEnrich reveals cancer type-specific EGFR and HER2 phosphorylation patterns
• Mutation effects are captured even without mutation data in model training
Summary
Proteins that impact phenotype and disease are often inferred from RNA expression, which poorly reflects protein abundance. We developed DeepGxP, a deep learning model trained on The Cancer Genome Atlas pan-cancer data, to predict protein abundance from transcriptomic profiles. DeepGxP outperformed conventional models, achieving a median Pearson’s correlation of 0.68 (n = 187) and predictive performance of 0.74 and 0.64 for proteins with high (≥0.31) and low (<0.31) self-gene/protein correlation, respectively. We also developed DeepEnrich, an integrated gradient-based interpretation framework that identifies predictor genes and enriched functions. For example, predictors of cyclin B1 and E2 are enriched in mitotic chromatid segregation and G2/M transition, respectively. In lung adenocarcinoma, we uncovered distinct EGFR/HER2 phosphorylation patterns in alveolar cells. In breast cancer, p53 protein, but not TP53 mRNA, correlated with survival. DeepGxP also accurately predicted the abundance of single-cell surface proteins, confirming cell identification. Our findings underscore DeepGxP’s potential in decoding gene-to-protein relationships for cancer biomarker discovery.
