The goal of our PGDAC is to improve understanding of the proteogenomic complexity of tumors. Towards this goal, our First Aim is to apply network based system learning to reveal causative molecular regulatory relationships contributing to varieties of phenotypes in cancer using CPTAC proteomic/genomic data. We will start with a mixed effects model to (1) fix the batch effects in data from multi-plex proteomics experiments; and (2) handle the large amount of missing data from abundance-dependent missing mechanisms in proteomic data (Aim 1.1). We will then utilize a multivariate penalized regression framework to construct the global regulatory networks between genomic alterations (such as DNA mutations, CNA, methylations), and protein as well as their PTM (post translational modification) abundances (Aim 1.2). Such regulatory networks help to elucidate how protein or pathway activities are shaped by genomic alterations in tumor cells. We will also construct protein co-expression networks based on global-, phosphor-, glyco- and other PTM-proteomics data (Aim 1.3). When constructing these networks, we will use advanced computational tools to effectively borrow information from literatures, databases, and transcriptome profiles. In addition, we will model tumor and normal tissues jointly, so that tumor specific interactions and network modules will be inferred with better accuracy.
Both Aims 1. 2 and 1.3 will lead to a big collection of network modules, as well as functionally related protein sets (e.g. proteins regulated by the same genomic alteration). These network modules and protein sets will then be tested for their associations with disease phenotypes (Aim 1.4). In the end, we will derive a more integrated view of commonalities and differences across multiple tumor types via a Pan-cancer analysis (Aim 1.5).
Our Second Aim i s to further develop methods, software, and web-tools to optimize the data analysis in our PGDAC. We will develop novel statistical/computational tools tailored to CPTAC proteomics data; implement these methods as computationally efficient software; and construct an integrated data analysis pipeline (Aim 2.1). We also plan to develop a set of web service tools for visualization and biological annotation of protein networks and clinical interpretation of proteomic data (Aim 2.2).
Our Third Aim i s to nominate novel protein-based cancer biomarkers and drug targets for further investigation by targeted proteomics assays. We will first utilize a prediction based scoring system to identify protein biomarkers that predict altered cancer pathways, network modules and individual oncogenes; disease outcome and drug resistance; and therapeutically distinct disease subtypes (Aim 3.1) We will then utilize network based tools to identify driver players in selected proteins signature sets (Aim 3.2). These driver proteins could play important roles in shaping the overall function of regulatory system, and thus serve as good candidates for cancer biomarkers and drug targets. We will also take into consideration of domain knowledge of different diseases, as well as technique constrains for developing targeted proteomics assays in biomarker selection. !
The goal of the proposed Proteogenomic Data Analysis Center is to elucidate proteogenomic complexity underlying tumors by employing network based systems learning on CPTAC proteomic/genomic data. Our team brings together extensive expertise in systems learning and proteomics data modeling. The success of this application will generate valuable resources on (1) high-confidence candidate biomarkers and drug targets; (2) new knowledge of cancer biology that involves integrated genomic, transcriptomic, and proteomic data; and (3) new bioinformatics tools that aid in the mining, visualization and interpretation of large-scale datasets.
Showing the most recent 10 out of 11 publications