Mass-spectrometry (MS) based proteomics is increasingly being used in conjunction with genome profiling and next-generation sequencing (NGS) for large-scale characterization of cancer samples including cell lines, patient-derived xenografts and tumor material. Publications from the NCI-CPTAC program and others have highlighted the utility of proteogenomic analysis in elucidating cancer biology and identifying aberrant proteins and signaling networks in cancer. But, a high throughput pipeline implementing a range of analyses for transforming genomic and proteomic data into information easily accessible to scientists is still lacking. We propose an integrated high throughput proteogenomic data analysis center (PGDAC) to address this immediate need. The PGDAC will exploit Firehose?a platform developed by our group that has set the standard for genome and NGS data analysis?to implement a flexible, robust, automated and reproducible proteogenomic data analysis pipeline and visualization portal. This cloud-based near-real-time platform will not only include a robust version of the pipeline created for recently completed proteogenomic studies from our group, but will also incorporate new tools and algorithms, especially for the analysis and visualization of phosphoproteomic data. The result will be an automated, version-controlled pipeline that provides an integrated view of clinical, genomic (CNA, mRNA, RNA-seq, mutation) and proteomic (global proteome, phosphoproteome, and other PTM) data, with analyses ranging from correlations, clustering, marker identification and pathway enrichment. The FireBrowse graphical user inferface, combined with other visualization tools, will provide a familiar, accessible and intuitive interactive user interface for non-computational scientists. Analysis results and reports will be hosted on local web portal, in addition to being uploaded to the DCC. The proteogenomic data analysis pipeline will be used for biomarker selection and enable therapeutic target identification using disease-specific and pan-cancer cohorts, and quantify changes to cellular signaling networks due to site-specific post-translational modifications and genetic aberrations.

Public Health Relevance

Genetic alterations in human cancer have been systematically studied over the past decade, but the impact of a majority of these changes on the proteome?the functional end of the genome? are poorly understood. In this project we will implement a large-scale data analysis platform that will integrate clinical, genomics and mass spectrometry-based proteomics data from the same samples/tumors, and bring to bear novel big-data algorithms and analysis methods to shed new light on the biology of cancer, the response and resistance to drug treatments and, importantly, help to identify new targets for therapeutic intervention.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Resource-Related Research Projects--Cooperative Agreements (U24)
Project #
5U24CA210979-05
Application #
10004584
Study Section
Special Emphasis Panel (ZCA1)
Program Officer
Rodriguez, Henry
Project Start
2016-09-15
Project End
2021-08-31
Budget Start
2020-09-01
Budget End
2021-08-31
Support Year
5
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Broad Institute, Inc.
Department
Type
DUNS #
623544785
City
Cambridge
State
MA
Country
United States
Zip Code
02142
Archer, Tenley C; Ehrenberger, Tobias; Mundt, Filip et al. (2018) Proteomics, Post-translational Modifications, and Integrative Analyses Reveal Molecular Heterogeneity within Medulloblastoma Subgroups. Cancer Cell 34:396-410.e8
Ruggles, Kelly V; Krug, Karsten; Wang, Xiaojing et al. (2017) Methods, Tools and Current Perspectives in Proteogenomics. Mol Cell Proteomics 16:959-981