Mass spectrometry based proteomics is a key technology for the identification, quantification and comparison of proteins and their post-translational modifications across all aspects of biology. One major barrier in proteomics workflows is the paucity of flexible and customizable computational frameworks for generating data analysis software pipelines. Current software pipelines are often static, with limited flexibility beyond the specific analyses for which they were created and generally lag behind due to little or no maintenance. These tools therefore lack broad functionality and improvements in areas such as statistical validation, limiting the potential of the software to mere spectrum matching through black box tools. To address these barriers, we have been developing, maintaining and distributing cutting edge proteomics computational tools and standards in data analysis over the last 12 years through our program suite called the Trans-Proteomic Pipeline. Concomitant with software development, we ensure wide community adoption through extensive tutoring to all interested users of the proteomics community. Through our development, we have provided new functionality for proteomics data analyses with both new and improved statistical validation and global qualification. However, with new styles of mass spectrometry instrumentation that attempt to now provide comprehensive analysis, software tools must be continually maintained and new functionality developed in an extensible and flexible framework that ensures robust, routine operation so that it provides the user community, from novice to the most extreme power experts, with trusted results. The Trans-Proteomic Pipeline has been the first and most continually developed and maintained product for these requirements. Our goal of providing all these tools as both full open source and freely available complement of programs, ensures wide adoption and permits community input into the tools development for the broadest and most needed functionality possible. This continuing program of development and maintenance of the Trans-Proteomic Pipeline builds on the successful approach of robust tools development with the focus of """"""""from start to the end analysis"""""""" of proteomics data. With ever increasing data collection rates at a """"""""Moore's Law"""""""" level, this program will continue to develop tools to analyze these larger and larger datasets. We will develop and integrate tools for new styles of proteomic data collection such as multiplexed data-independent analysis capable of providing near full proteome quantitation in a single analysis, integration of next generation RNA-seq genomic analysis for sample specific databases, post-translational modification statistical analysis for confident site specific identification, and the implementation of new selected-reaction monitoring capabilities that drive proteomics as the next generation """"""""Western Blot"""""""". All these efforts are underpinned by a strong computational and biochemistry focused background to ensure the tools are well written with maximum relevance to biology.

Public Health Relevance

Mass spectrometry based proteomics is a key technology for the identification and quantification of proteins in many types of biological samples. Proteomics can be used to determine aberrant expression of proteins that can indicate which cellular systems are under stress or the cause of a disease and proteomics can be used for the discovery of disease biomarkers and validation for therapeutic strategies. At a more basic level, proteomics via mass spectrometry allows for an understanding of how the 20,000 proteins in a human can interact in complex biological systems. However, when compared to other biomedical measurement technologies, this incredibly powerful technology is limited in its scope and its pervasiveness due to the data handling systems taking the output of the mass spectrometers. To understand this data, one needs to quickly, comprehensively and statistically validate this data with confidence to allow all users to understand, compare and share data from the novice to the expert user .The opportunity, presented within this grant, is to take the leading freely available proteomics analysis software suite for the analysis of mass spectrometry, the Trans-Proteomics Pipeline (TPP), and make it easier to use, maintain and extend to ensure all users of mass spectrometry gain the most benefit out of the powerful mass spectrometry technology used for proteomics.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
2R01GM087221-04
Application #
8698152
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Sheeley, Douglas
Project Start
2009-04-01
Project End
2018-03-31
Budget Start
2014-04-01
Budget End
2015-03-31
Support Year
4
Fiscal Year
2014
Total Cost
$521,297
Indirect Cost
$234,367
Name
Institute for Systems Biology
Department
Type
DUNS #
135646524
City
Seattle
State
WA
Country
United States
Zip Code
98109
Shao, Wenguang; Pedrioli, Patrick G A; Wolski, Witold et al. (2018) The SysteMHC Atlas project. Nucleic Acids Res 46:D1237-D1247
Menschaert, Gerben; Wang, Xiaojing; Jones, Andrew R et al. (2018) The proBAM and proBed standard formats: enabling a seamless integration of genomics and proteomics data. Genome Biol 19:12
Zhang, Chengxin; Wei, Xiaoqiong; Omenn, Gilbert S et al. (2018) Structure and Protein Interaction-based Gene Ontology Annotations Reveal Likely Functions of Uncharacterized Proteins on Human Chromosome 17. J Proteome Res :
Lee, Joon-Yong; Choi, Hyungwon; Colangelo, Christopher M et al. (2018) ABRF Proteome Informatics Research Group (iPRG) 2016 Study: Inferring Proteoforms from Bottom-up Proteomics Data. J Biomol Tech 29:39-45
Maixner, Frank; Turaev, Dmitrij; Cazenave-Gassiot, Amaury et al. (2018) The Iceman's Last Meal Consisted of Fat, Wild Meat, and Cereals. Curr Biol 28:2348-2355.e9
Hoopmann, Michael R; Winget, Jason M; Mendoza, Luis et al. (2018) StPeter: Seamless Label-Free Quantification with the Trans-Proteomic Pipeline. J Proteome Res 17:1314-1320
Slama, Patrick; Hoopmann, Michael R; Moritz, Robert L et al. (2018) Robust determination of differential abundance in shotgun proteomics using nonparametric statistics. Mol Omics 14:424-436
Choi, Meena; Eren-Dogu, Zeynep F; Colangelo, Christopher et al. (2017) ABRF Proteome Informatics Research Group (iPRG) 2015 Study: Detection of Differentially Abundant Proteins in Label-Free Quantitative LC-MS/MS Experiments. J Proteome Res 16:945-957
Deutsch, Eric W; Csordas, Attila; Sun, Zhi et al. (2017) The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition. Nucleic Acids Res 45:D1100-D1106
McCord, James; Sun, Zhi; Deutsch, Eric W et al. (2017) The PeptideAtlas of the Domestic Laying Hen. J Proteome Res 16:1352-1363

Showing the most recent 10 out of 80 publications