Mass spectrometry (MS) based proteomics is currently the most widely used technology for the analysis of complex protein mixtures. It has the ability to detect and quantify the abundance of thousands of proteins and their variants, post-translational modifications, and interactions per experiment. There is a robust set of open, standardized data formats for encoding data and metadata from most stages of MS proteomics analysis, developed by the Proteomics Standards Initiative (PSI). However, there is not currently a standardized mechanism for universally referencing a spectrum that is used in an analysis or held up as evidence for a published claim. Further, despite the widely recognized significant advantages of spectrum matching approaches, an approved PSI standard for the storage and exchange of reference spectra in the form of spectral libraries is still glaringly absent. Here we propose a major advancement in data standards for proteomics mass spectra with the development of three interrelated standards. First, in order to solve the difficulty in identifying and accessing a specific spectrum in resources throughout the world, we will develop a universal spectrum identifier standard that can be widely used to reference, locate and access a specific spectrum. Second, building on PSI's extensive experience in developing official standard formats that are widely used, we will overhaul the current set of crude spectral library formats and develop a new standardized and comprehensive spectral library format that will be effective for the storage, use, and exchange of reference spectra. Third, we will develop a standard application programming interface that deploys the standards to the whole community by enabling users and automated software to query and exchange information about spectra, peptides, and proteins. These standards will be developed according the effective methodologies that the PSI has developed since its inception in 2002. This means that we will assemble the important stakeholders from all over the world to jointly develop the standards, create specification documents and examples. These specification documents then undergo the official PSI document process, which subjects each proposed standard to three rounds of iterative review and refinement. We will then develop open-source software that enables the use of these standards in multiple programming languages in order to promote widespread usage. Finally we will implement these standards via these software libraries at the three largest ProteomeXchange proteomics data repositories, which will ensure high visibility. The development of these three interrelated standards will achieve a substantial advance for the field of proteomics MS, and may well extend to MS-based metabolomics as well.

Public Health Relevance

Mass spectrometry based proteomics is a key technology for the identification and quantification of proteins in many types of biological samples, which can be used to determine how abundances of proteins are affected by perturbed cellular systems under stress or the cause of a disease. We propose a substantial advancement of this technology by developing standards that enable researchers to find, access, and reuse individual reference mass spectra among other types of data generated in a proteomics experiment.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Resource-Related Research Projects (R24)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Ravichandran, Veerasamy
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Institute for Systems Biology
United States
Zip Code
Zhang, Chengxin; Wei, Xiaoqiong; Omenn, Gilbert S et al. (2018) Structure and Protein Interaction-based Gene Ontology Annotations Reveal Likely Functions of Uncharacterized Proteins on Human Chromosome 17. J Proteome Res :
Lill, Jennie R; van Veelen, Peter A; Tenzer, Stefan et al. (2018) Minimal Information About an Immuno-Peptidomics Experiment (MIAIPE). Proteomics 18:e1800110
Menschaert, Gerben; Wang, Xiaojing; Jones, Andrew R et al. (2018) The proBAM and proBed standard formats: enabling a seamless integration of genomics and proteomics data. Genome Biol 19:12
Deutsch, Eric W; Orchard, Sandra; Binz, Pierre-Alain et al. (2017) Proteomics Standards Initiative: Fifteen Years of Progress and Future Work. J Proteome Res 16:4288-4298
Schwenk, Jochen M; Omenn, Gilbert S; Sun, Zhi et al. (2017) The Human Plasma Proteome Draft of 2017: Building on the Human Plasma PeptideAtlas from Mass Spectrometry and Complementary Assays. J Proteome Res 16:4299-4310