Computational analysis of mass spectrometry (MS) data presents a significant challenge, especially when MS-based proteomic approaches are applied to profile complex protein mixtures such as human serum or tissues. We propose to develop a set of statistical models and algorithms that will enable robust, accurate, and transparent analysis of large-scale quantitative tandem mass-spectrometry (MS/MS) based proteomic datasets from human clinical cancer specimens. To achieve this we will 1) develop novel data analysis methods and algorithms for statistical validation of peptide assignments to MS/MS spectra generated using any type of MS instrumentation, experimental sample preparation protocols, and MS/MS database search software 2) develop an integrated, probability-based informatics approach for assembling peptides into proteins and for inferring the identities and changes in the abundance of proteins between compared samples, thus increasing the power of the shotgun proteomic approach to identify low molecular weight and low abundance proteins, discriminate between protein isoforms, and detect post-translational processing events 3) introduce multivariate metrics for assessing the quality of MS/MS data and design iterative computational strategies for reanalysis of unassigned high quality spectra 4) develop statistical models for quantifying error rates in composite databases of peptide and protein identifications collected from different studies, thus enabling accurate cross-laboratory comparison, data mining, and selection of candidates for targeted proteomic profiling of clinical samples. We will integrate these methods and tools in the existing open source data analysis platform Trans-Proteomic Pipeline, and will disseminate the new tools, statistical methodologies and educational materials to the proteomic community. The ultimate goal of the proposed computational research is to enable fast and automated generation of high quality proteomic dataset with accurately determined error rates, thus removing one of the main technical barriers currently plaguing the field of proteomics.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZCA1-SRRB-9 (O1))
Program Officer
Rodriguez, Henry
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Michigan Ann Arbor
Schools of Medicine
Ann Arbor
United States
Zip Code
Feltham, Rebecca; Jamal, Kunzah; Tenev, Tencho et al. (2018) Mind Bomb Regulates Cell Death during TNF Signaling by Suppressing RIPK1's Cytotoxic Potential. Cell Rep 23:470-484
Rolland, Delphine; Basrur, Venkatesha; Conlon, Kevin et al. (2014) Global phosphoproteomic profiling reveals distinct signatures in B-cell non-Hodgkin lymphomas. Am J Pathol 184:1331-42
Shteynberg, David; Nesvizhskii, Alexey I; Moritz, Robert L et al. (2013) Combining results of multiple search engines in proteomics. Mol Cell Proteomics 12:2383-93
Ning, Kang; Fermin, Damian; Nesvizhskii, Alexey I (2012) Comparative analysis of different label-free mass spectrometry based protein abundance estimates and their correlation with RNA-Seq gene expression data. J Proteome Res 11:2261-71
Ma, Kelvin; Vitek, Olga; Nesvizhskii, Alexey I (2012) A statistical model-building perspective to identification of MS/MS spectra with PeptideProphet. BMC Bioinformatics 13 Suppl 16:S1
Nesvizhskii, Alexey I (2012) Computational and informatics strategies for identification of specific protein interaction partners in affinity purification mass spectrometry experiments. Proteomics 12:1639-55
Kwon, Taejoon; Choi, Hyungwon; Vogel, Christine et al. (2011) MSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines. J Proteome Res 10:2949-58
Choi, Hyungwon; Larsen, Brett; Lin, Zhen-Yuan et al. (2011) SAINT: probabilistic scoring of affinity purification-mass spectrometry data. Nat Methods 8:70-3
Shteynberg, David; Deutsch, Eric W; Lam, Henry et al. (2011) iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Mol Cell Proteomics 10:M111.007690
Fermin, Damian; Basrur, Venkatesha; Yocum, Anastasia K et al. (2011) Abacus: a computational tool for extracting and pre-processing spectral count data for label-free quantitative proteomic analysis. Proteomics 11:1340-5

Showing the most recent 10 out of 34 publications