Mass spectrometry (MS) based proteomics is a key technology for the identification, quantification and comparison of proteins and their post-translational modifications across all aspects of biology. MS datasets have been growing ever larger with the advancement of instrumentation, as has the archive of experimental data available for re-analysis and comparison. In order to meet the needs of the proteomics community for coping with big data, we have been developing our end-to-end suite of data processing and analysis tools, called the Trans-Proteomic Pipeline (TPP). This project will advance the widely used TPP software suite to become even more useful to its user community, enabling them to perform their analyses even faster with less human effort, and adding capabilities that are currently not possible or are only in testing stages. We will add full end-to-end TPP support for the data independent acquisition (DIA) workflows, such as SWATH-MS, and proteogenomics workflows, such as RNA-seq assisted proteomics. The TPP already has partial support for these workflows, but needs additional finishing, hardening, and extension to high capacity cloud computing platforms to become truly useful to all our users. As protein abundance quantification becomes even more essential to more experiments, we will enhance our existing tools for isotopic and isobaric labeled data as well as label-free data, and build a new analysis workbench that will give our users access to advanced statistical analysis and comparison routines that already exist but are difficult for many users to handle. In addition to bundling this statistical software, we will build a framework that allows users to take their quantitative results from any of the traditional workflows or new workflows, transform them into the formats that the statistical packages require, and then visualize and interactively explore the outputs of statistical analysis, so trends can be uncovered and outliers verified in the original data. A substantial number of smaller enhancements to the TPP suite will be made to make the tools smarter so that users are relieved of the burden setting parameters and shepherding data through various tools. We will develop new modes of operation for existing tools to be able to handle challenges presented by our users based on the feedback we receive from them. We will continue our many outreach efforts, which include teaching software courses several times per year, hosting workshops and booths at scientific conferences to meet with and gain feedback from our users, and develop many more publicly available tutorials and recipes for using the tools and applications to various circumstances. We will of course continue to disseminate the advancements of the TPP with articles in the literature and with presentations at scientific conferences. In summary, this proposed program will continue to advance the TPP as the preeminent free and open-source end-to-end software analysis tool suite for routine and big data applications in proteomics.
The continued development and maintenance of the Trans-Proteomic Pipeline software will enable and accelerate the application of mass spectrometry based proteomics to the study of the dynamic nature of proteins in human health and disease, in diagnostic techniques, and in the development of therapeutics. This will be accomplished by enhancing the Trans-Proteomic Pipeline through modernization, hardening and extensive tutorials making it easier to use, applicable to new users and environments, easier to deploy, and by extending its capabilities to important emerging proteomics workflows to ensure all users of mass spectrometry gain the most benefit out of powerful mass spectrometry technology used for proteomics.
Showing the most recent 10 out of 80 publications