Mass spectrometry (MS) based proteomics is a key technology for the identification, quantification and comparison of proteins and their post-translational modifications across all aspects of biology. MS datasets have been growing ever larger with the advancement of instrumentation, as has the archive of experimental data available for re-analysis and comparison. In order to meet the needs of the proteomics community for coping with big data, we have been developing our end-to-end suite of data processing and analysis tools, called the Trans-Proteomic Pipeline (TPP). This project will advance the widely used TPP software suite to become even more useful to its user community, enabling them to perform their analyses even faster with less human effort, and adding capabilities that are currently not possible or are only in testing stages. We will add full end-to-end TPP support for the data independent acquisition (DIA) workflows, such as SWATH-MS, and proteogenomics workflows, such as RNA-seq assisted proteomics. The TPP already has partial support for these workflows, but needs additional finishing, hardening, and extension to high capacity cloud computing platforms to become truly useful to all our users. As protein abundance quantification becomes even more essential to more experiments, we will enhance our existing tools for isotopic and isobaric labeled data as well as label-free data, and build a new analysis workbench that will give our users access to advanced statistical analysis and comparison routines that already exist but are difficult for many users to handle. In addition to bundling this statistical software, we will build a framework that allows users to take their quantitative results from any of the traditional workflows or new workflows, transform them into the formats that the statistical packages require, and then visualize and interactively explore the outputs of statistical analysis, so trends can be uncovered and outliers verified in the original data. A substantial number of smaller enhancements to the TPP suite will be made to make the tools smarter so that users are relieved of the burden setting parameters and shepherding data through various tools. We will develop new modes of operation for existing tools to be able to handle challenges presented by our users based on the feedback we receive from them. We will continue our many outreach efforts, which include teaching software courses several times per year, hosting workshops and booths at scientific conferences to meet with and gain feedback from our users, and develop many more publicly available tutorials and recipes for using the tools and applications to various circumstances. We will of course continue to disseminate the advancements of the TPP with articles in the literature and with presentations at scientific conferences. In summary, this proposed program will continue to advance the TPP as the preeminent free and open-source end-to-end software analysis tool suite for routine and big data applications in proteomics.

Public Health Relevance

The continued development and maintenance of the Trans-Proteomic Pipeline software will enable and accelerate the application of mass spectrometry based proteomics to the study of the dynamic nature of proteins in human health and disease, in diagnostic techniques, and in the development of therapeutics. This will be accomplished by enhancing the Trans-Proteomic Pipeline through modernization, hardening and extensive tutorials making it easier to use, applicable to new users and environments, easier to deploy, and by extending its capabilities to important emerging proteomics workflows to ensure all users of mass spectrometry gain the most benefit out of powerful mass spectrometry technology used for proteomics.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM087221-09
Application #
9748531
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
2010-09-01
Project End
2022-04-30
Budget Start
2019-05-01
Budget End
2020-04-30
Support Year
9
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Institute for Systems Biology
Department
Type
DUNS #
135646524
City
Seattle
State
WA
Country
United States
Zip Code
98109
Shao, Wenguang; Pedrioli, Patrick G A; Wolski, Witold et al. (2018) The SysteMHC Atlas project. Nucleic Acids Res 46:D1237-D1247
Menschaert, Gerben; Wang, Xiaojing; Jones, Andrew R et al. (2018) The proBAM and proBed standard formats: enabling a seamless integration of genomics and proteomics data. Genome Biol 19:12
Zhang, Chengxin; Wei, Xiaoqiong; Omenn, Gilbert S et al. (2018) Structure and Protein Interaction-based Gene Ontology Annotations Reveal Likely Functions of Uncharacterized Proteins on Human Chromosome 17. J Proteome Res :
Lee, Joon-Yong; Choi, Hyungwon; Colangelo, Christopher M et al. (2018) ABRF Proteome Informatics Research Group (iPRG) 2016 Study: Inferring Proteoforms from Bottom-up Proteomics Data. J Biomol Tech 29:39-45
Maixner, Frank; Turaev, Dmitrij; Cazenave-Gassiot, Amaury et al. (2018) The Iceman's Last Meal Consisted of Fat, Wild Meat, and Cereals. Curr Biol 28:2348-2355.e9
Hoopmann, Michael R; Winget, Jason M; Mendoza, Luis et al. (2018) StPeter: Seamless Label-Free Quantification with the Trans-Proteomic Pipeline. J Proteome Res 17:1314-1320
Slama, Patrick; Hoopmann, Michael R; Moritz, Robert L et al. (2018) Robust determination of differential abundance in shotgun proteomics using nonparametric statistics. Mol Omics 14:424-436
Zolg, Daniel P; Wilhelm, Mathias; Schnatbaum, Karsten et al. (2017) Building ProteomeTools based on a complete synthetic human proteome. Nat Methods 14:259-262
Schwenk, Jochen M; Omenn, Gilbert S; Sun, Zhi et al. (2017) The Human Plasma Proteome Draft of 2017: Building on the Human Plasma PeptideAtlas from Mass Spectrometry and Complementary Assays. J Proteome Res 16:4299-4310
Choi, Meena; Eren-Dogu, Zeynep F; Colangelo, Christopher et al. (2017) ABRF Proteome Informatics Research Group (iPRG) 2015 Study: Detection of Differentially Abundant Proteins in Label-Free Quantitative LC-MS/MS Experiments. J Proteome Res 16:945-957

Showing the most recent 10 out of 80 publications