The goal of our PGDAC is to improve understanding of the proteogenomic complexity of tumors. Towards this goal, our First Aim is to apply network based system learning to reveal causative molecular regulatory relationships contributing to varieties of phenotypes in cancer using CPTAC proteomic/genomic data. We will start with a mixed effects model to (1) fix the batch effects in data from multi-plex proteomics experiments; and (2) handle the large amount of missing data from abundance-dependent missing mechanisms in proteomic data (Aim 1.1). We will then utilize a multivariate penalized regression framework to construct the global regulatory networks between genomic alterations (such as DNA mutations, CNA, methylations), and protein as well as their PTM (post translational modification) abundances (Aim 1.2). Such regulatory networks help to elucidate how protein or pathway activities are shaped by genomic alterations in tumor cells. We will also construct protein co-expression networks based on global-, phosphor-, glyco- and other PTM-proteomics data (Aim 1.3). When constructing these networks, we will use advanced computational tools to effectively borrow information from literatures, databases, and transcriptome profiles. In addition, we will model tumor and normal tissues jointly, so that tumor specific interactions and network modules will be inferred with better accuracy.
Both Aims 1. 2 and 1.3 will lead to a big collection of network modules, as well as functionally related protein sets (e.g. proteins regulated by the same genomic alteration). These network modules and protein sets will then be tested for their associations with disease phenotypes (Aim 1.4). In the end, we will derive a more integrated view of commonalities and differences across multiple tumor types via a Pan-cancer analysis (Aim 1.5).
Our Second Aim i s to further develop methods, software, and web-tools to optimize the data analysis in our PGDAC. We will develop novel statistical/computational tools tailored to CPTAC proteomics data; implement these methods as computationally efficient software; and construct an integrated data analysis pipeline (Aim 2.1). We also plan to develop a set of web service tools for visualization and biological annotation of protein networks and clinical interpretation of proteomic data (Aim 2.2).
Our Third Aim i s to nominate novel protein-based cancer biomarkers and drug targets for further investigation by targeted proteomics assays. We will first utilize a prediction based scoring system to identify protein biomarkers that predict altered cancer pathways, network modules and individual oncogenes; disease outcome and drug resistance; and therapeutically distinct disease subtypes (Aim 3.1) We will then utilize network based tools to identify driver players in selected proteins signature sets (Aim 3.2). These driver proteins could play important roles in shaping the overall function of regulatory system, and thus serve as good candidates for cancer biomarkers and drug targets. We will also take into consideration of domain knowledge of different diseases, as well as technique constrains for developing targeted proteomics assays in biomarker selection. !

Public Health Relevance

The goal of the proposed Proteogenomic Data Analysis Center is to elucidate proteogenomic complexity underlying tumors by employing network based systems learning on CPTAC proteomic/genomic data. Our team brings together extensive expertise in systems learning and proteomics data modeling. The success of this application will generate valuable resources on (1) high-confidence candidate biomarkers and drug targets; (2) new knowledge of cancer biology that involves integrated genomic, transcriptomic, and proteomic data; and (3) new bioinformatics tools that aid in the mining, visualization and interpretation of large-scale datasets.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Resource-Related Research Projects--Cooperative Agreements (U24)
Project #
5U24CA210993-05
Application #
9994849
Study Section
Special Emphasis Panel (ZCA1)
Program Officer
Rodriguez, Henry
Project Start
2016-09-19
Project End
2021-08-31
Budget Start
2020-09-01
Budget End
2021-08-31
Support Year
5
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Icahn School of Medicine at Mount Sinai
Department
Genetics
Type
Schools of Medicine
DUNS #
078861598
City
New York
State
NY
Country
United States
Zip Code
10029
Wang, Jiebiao; Liu, Qianying; Pierce, Brandon L et al. (2018) A meta-analysis approach with filtering for identifying gene-level gene-environment interactions. Genet Epidemiol 42:434-446
Whiteaker, Jeffrey R; Zhao, Lei; Saul, Rick et al. (2018) A Multiplexed Mass Spectrometry-Based Assay for Robust Quantification of Phosphosignaling in Response to DNA Damage. Radiat Res 189:505-518
Fu, Rong; Wang, Pei; Ma, Weiping et al. (2017) A statistical method for detecting differentially expressed SNVs based on next-generation RNA-seq data. Biometrics 73:42-51
Petralia, Francesca; Aushev, Vasily N; Gopalakrishnan, Kalpana et al. (2017) A new method to study the change of miRNA-mRNA interactions due to environmental exposures. Bioinformatics 33:i199-i207
Cohain, Ariella; Divaraniya, Aparna A; Zhu, Kuixi et al. (2017) EXPLORING THE REPRODUCIBILITY OF PROBABILISTIC CAUSAL MOLECULAR NETWORK MODELS. Pac Symp Biocomput 22:120-131
Hoffman, Gabriel E; Schadt, Eric E (2016) variancePartition: interpreting drivers of variation in complex gene expression studies. BMC Bioinformatics 17:483
Petralia, Francesca; Song, Won-Min; Tu, Zhidong et al. (2016) New Method for Joint Network Analysis Reveals Common and Different Coexpression Patterns among Genes and Proteins in Breast Cancer. J Proteome Res 15:743-54
Sanderson, Saskia C; Linderman, Michael D; Suckiel, Sabrina A et al. (2016) Motivations, concerns and preferences of personal genome sequencing research participants: Baseline findings from the HealthSeq project. Eur J Hum Genet 24:14-20
Wang, Xianlong; Qin, Li; Zhang, Hexin et al. (2015) A regularized multivariate regression approach for eQTL analysis. Stat Biosci 7:129-146
Danaher, P; Paul, D; Wang, P (2015) Covariance-based analyses of biological pathways. Biometrika 102:533-544

Showing the most recent 10 out of 11 publications