Cellular signal transduction system (CSTS) plays a fundamental role in maintaining homeostasis of a cell, and perturbations of CSTS lead to diseases such as cancers and diabetes. Most of cellular signaling pathways eventually regulate gene expression, thus the latter can be used as a universal bioassay reflecting the state of CSTS. However, the gene expression profile from a cell reflects a mixture of responses to all active signaling pathways, thus it is a challenge to de-convolute the signals embedded in the gene expression data. With advent of high throughput biotechnology, major public databases host the results of millions of gene expression assays, collected under diverse diseases as well as the experimental conditions designed by experimental biologists to probe almost all aspects of CSTS. The wealth of these big data provides unprecedented opportunities to investigate CSTS under physiological and pathological conditions, but it also poses unprecedented challenges: how to reveal signals from convoluted data and turn big data into knowledge. We hypothesize that given a sufficiently large compendium of gene expression data collected under diverse conditions, in which different parts of CSTS are perturbed (either by designed experiments or by diseases), the cellular signals embedded in the gene expression data can be revealed and their organization can inferred using current state-of-the-art deep learning models. In this project, we will compil a comprehensive compendium of human gene expression data and then employ modern deep-learning algorithms and supercomputers to mine the data.
We aim to reveal major cellular signals that regulate gene expression under physiological and pathological conditions and to infer the organization of signals in human CSTS. Combined the identified signals with genomic alteration data and drug response data, we aim to further identify pathways underlying disease such as cancers, to use the genomic data to predict drug sensitivity of cancer cell lines, and to predict patient clinical outcomes, in a pathway-centered manner.

Public Health Relevance

This project aims to develop and apply novel ?deep learning? algorithms to mine a comprehensive compendium of genome-scale data in order to reveal the major signals regulating gene expression under physiological and pathological conditions. This knowledge can be further applied to understand disease mechanisms of cancers and guide personalized treatment of cancer patients.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM012011-02
Application #
9042426
Study Section
Special Emphasis Panel (ZLM1)
Program Officer
Ye, Jane
Project Start
2015-04-01
Project End
2019-03-31
Budget Start
2016-04-01
Budget End
2017-03-31
Support Year
2
Fiscal Year
2016
Total Cost
Indirect Cost
Name
University of Pittsburgh
Department
Miscellaneous
Type
Schools of Medicine
DUNS #
004514360
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213
Ding, Michael Q; Chen, Lujia; Cooper, Gregory F et al. (2018) Precision Oncology beyond Targeted Therapy: Combining Omics Data with Machine Learning Matches the Majority of Cancer Cells to Effective Therapeutics. Mol Cancer Res 16:269-278
Young, Jonathan D; Cai, Chunhui; Lu, Xinghua (2017) Unsupervised deep learning reveals prognostically relevant subtypes of glioblastoma. BMC Bioinformatics 18:381
Yan, Gaibo; Chen, Vicky; Lu, Xinghua et al. (2017) A signal-based method for finding driver modules of breast cancer metastasis to the lung. Sci Rep 7:10023
Huang, Tianzhi; Kim, Chung Kwon; Alvarez, Angel A et al. (2017) MST4 Phosphorylation of ATG4B Regulates Autophagic Activity, Tumorigenicity, and Radioresistance in Glioblastoma. Cancer Cell 32:840-855.e8
Chen, Vicky; Paisley, John; Lu, Xinghua (2017) Revealing common disease mechanisms shared by tumors of different tissues of origin through semantic representation of genomic alterations and topic modeling. BMC Genomics 18:105
Huang, Tianzhi; Alvarez, Angel A; Pangeni, Rajendra P et al. (2016) A regulatory circuit of miR-125b/miR-20b and Wnt signalling controls glioblastoma phenotypes through FZD6-modulated pathways. Nat Commun 7:12885
Hill, Steven M; Heiser, Laura M; Cokelaer, Thomas et al. (2016) Inferring causal molecular networks: empirical assessment through a community-based effort. Nat Methods 13:310-8
Chen, Lujia; Cai, Chunhui; Chen, Vicky et al. (2016) Learning a hierarchical representation of the yeast transcriptomic machinery using an autoencoder model. BMC Bioinformatics 17 Suppl 1:9
Lu, Songjian; Cai, Chunhui; Yan, Gonghong et al. (2016) Signal-Oriented Pathway Analyses Reveal a Signaling Complex as a Synthetic Lethal Target for p53 Mutations. Cancer Res 76:6785-6794
Lu, Songjian; Mandava, Gunasheil; Yan, Gaibo et al. (2016) An exact algorithm for finding cancer driver somatic genome alterations: the weighted mutually exclusive maximum set cover problem. Algorithms Mol Biol 11:11

Showing the most recent 10 out of 16 publications