Robust Accurate Identification of peptides from tandem mass spectrometry data

Yu, Yi-Kuo

Abstract

Although heavily concentrated and studied, the statistical accuracy of peptide identification remains challenging. There are many peptide identification methods using database searches and assigning the E-value to peptide hits, however, the E-values reported by different methods do not agree with each other and none of them agree with the textbook definition of the E-value. This obviously hinders the feasibility of combining search results from different methods. In particular, if one wishes to combine methods with user-assigned weights. For the past year, one of our major efforts is to develop statistical approach to properly take into account during data analysis the proteotypic peptides, that is, peptides that are consistently observed in mass spectrometry based proteomics experiments. We have illustrated that the proteotypic information does help retrieval performance provided that it is incorporated into the database with sufficient quality control. We have published our results in Journal of Proteomics (doi:10.1016/j.jprot.2010.10.005). Another direction that we embark on is to utilize the score statistics of all possible peptides in various applications. In 2008, we have shown the possibility of scoring trillions of trillions of peptides to form the score histogram of all possible peptides for a given MS spectrum and an additive scoring function. In the past year, we have turned this somewhat theoretical result into pragmatic use by re-expressing several well-known scoring functions in the field of computational proteomics into additive forms and thus obtain the unified score statistics for those scoring functions. A main difficulty we need to circumvent is to learn about how each scoring function pre-process the query spectrum. This critical step largely determines the final score of each candidate peptide. In order to achieve this task, we need to dig into other analysis programs to extract their heuristic filtering rules. After accomplishing this daunting task, we have successfully built an application tool that allows for (1) combining search results using all-possible-peptide score statistics and (2) reassignment of E-values. Our new application, RAId_aPS, is now available on our group website and the results are written and published in PLoS One (doi:10.1371/journal.pone.0015438). When prior knowledge is available, it is often desirable to weight search methods differently before combining their search results. We have provided a way to combine search results democratically in one of our 2008 publications. When different weights are present, an instability issue occurs if some of the weights are nearly degenerate. In the past year, we have devised a mathematical framework to completely eliminate the possible instability. This work is recently published in PLoS One ( doi:10.1371/journal.pone.0022647).

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Investigator-Initiated Intramural Research Projects (ZIA)
Project #: 1ZIALM092404-08
Application #: 8344962
Study Section

Project Start
Project End
Budget Start
Budget End
Support Year: 8
Fiscal Year: 2011
Total Cost: $399,742
Indirect Cost

Institution

Name: National Library of Medicine
Department
Type
DUNS #

City
State
Country
Zip Code

Related projects

Publications

Joyce, Brendan; Lee, Danny; Rubio, Alex et al. (2018) A graphical user interface for RAId, a knowledge integrated proteomics analysis suite with accurate statistics. BMC Res Notes 11:182

Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y et al. (2018) Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry. J Am Soc Mass Spectrom 29:1721-1737

Alves, Gelio; Yu, Yi-Kuo (2016) Confidence assignment for mass spectrometry based peptide identifications via the extreme value distribution. Bioinformatics 32:2642-9

Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y et al. (2016) Identification of Microorganisms by High Resolution Tandem Mass Spectrometry with Accurate Statistical Significance. J Am Soc Mass Spectrom 27:194-210

Hamaneh, Mehdi B; Yu, Yi-Kuo (2015) DeCoaD: determining correlations among diseases using protein interaction networks. BMC Res Notes 8:226

Hamaneh, Mehdi Bagheri; Haber, Jonah; Yu, Yi-Kuo (2015) Analytical solution and scaling of fluctuations in complex networks traversed by damped, interacting random walkers. Phys Rev E Stat Nonlin Soft Matter Phys 92:052803

Alves, Gelio; Yu, Yi-Kuo (2015) Mass spectrometry-based protein identification with accurate statistical significance assignment. Bioinformatics 31:699-706

Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo (2014) Molecular Isotopic Distribution Analysis (MIDAs) with adjustable mass accuracy. J Am Soc Mass Spectrom 25:57-70

Hamaneh, Mehdi Bagheri; Yu, Yi-Kuo (2014) Relating diseases by integrating gene associations and information flow through protein interaction network. PLoS One 9:e110936

Alves, Gelio; Yu, Yi-Kuo (2014) Accuracy evaluation of the unified P-value from combining correlated P-values. PLoS One 9:e91225

Showing the most recent 10 out of 26 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: