Robust Accurate Identification of peptides from tandem mass spectrometry data

Yu, Yi-Kuo

Abstract

Although heavily concentrated and studied, the statistical accuracy of peptide identification remains challenging. There are many peptide identification methods using database searches and assigning the E-value to peptide hits, however, the E-values reported by different methods do not agree with each other and few of them, if any, agree with the textbook definition of the E-value. This obviously hinders the feasibility of combining search results from different methods. In particular, if one wishes to combine methods with user-assigned weights. When prior knowledge is available, it is often desirable to weight search methods differently before combining their search results. We have provided a way to combine search results democratically in one of our 2008 publications. When different weights are present, an instability issue occurs if some of the weights are nearly degenerate. In 2011, we have devised a mathematical framework to completely eliminate the possible instability. We have recently expanded this idea to incorporate the case when the methods being considered are correlated. In our first step towards this direction, we assessed the accuracy of combined P-values upon using various published methods. The results are under preparation to be submitted to PLoS One. The past year we also finished the project of constructing a fast and accurate algorithm for computing the molecular mass isotopic distribution given the elemental compositions. In comparison to the existing algorithms/methods, our method provides both coarse and fine resolution of the spectrum as opposed to only coarse or fine spectrum provided. In addition, under several testing criteria, our method seems to be among the best performing ones. We have made a web service available in our group website www.ncbi.nlm.nih.gov/CBBresearch/Yu/midas/index.html The method part is also written as a paper recently accepted by the Journal of American Society of Mass Spectrometry. To better delineate the difference between our statistical approach (stratified statistics) and other methods, we have also written a manuscript describing how the stratification of search space can lead to better retrieval performance. This paper is published in the Journal of Proteome Research. Since most protein association (complex) data are obtained from pull down experiment analyzed using mass spectrometers to identify co-pull-down partner proteins, our efforts in proteomics also include investigation of protein-protein interaction network as well as association data organization. Our investigation on these problems has led to development of a few tools that are suitable for network exploration, hypotheses forming and a new method of organizing protein association to a directed acyclic graph which alleviates the effect of non-specific bindings and false positive associations as well as false negatives. We are currently preparing a manuscript to be submitted to PLoS Computational Biology. For the past year, one of our major efforts is to understand physical mechanisms for peptide fragmentation so as to predict the possibility of observing a certain peak. If this works, it can significantly improve peptide scoring by providing a peptide-specific peak filtering. The results inidicate that the dissociation energy can be a good indicator for the observability of certain peaks, at least for short peptide with non-polar side-chains. The manuscript was recently published in the Rapid Communications in Mass Spectrometry.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Investigator-Initiated Intramural Research Projects (ZIA)
Project #: 1ZIALM092404-10
Application #: 8746752
Study Section

Project Start
Project End
Budget Start
Budget End
Support Year: 10
Fiscal Year: 2013
Total Cost: $678,028
Indirect Cost

Institution

Name: National Library of Medicine
Department
Type
DUNS #

City
State
Country
Zip Code

Related projects

Publications

Joyce, Brendan; Lee, Danny; Rubio, Alex et al. (2018) A graphical user interface for RAId, a knowledge integrated proteomics analysis suite with accurate statistics. BMC Res Notes 11:182

Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y et al. (2018) Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry. J Am Soc Mass Spectrom 29:1721-1737

Alves, Gelio; Yu, Yi-Kuo (2016) Confidence assignment for mass spectrometry based peptide identifications via the extreme value distribution. Bioinformatics 32:2642-9

Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y et al. (2016) Identification of Microorganisms by High Resolution Tandem Mass Spectrometry with Accurate Statistical Significance. J Am Soc Mass Spectrom 27:194-210

Hamaneh, Mehdi B; Yu, Yi-Kuo (2015) DeCoaD: determining correlations among diseases using protein interaction networks. BMC Res Notes 8:226

Hamaneh, Mehdi Bagheri; Haber, Jonah; Yu, Yi-Kuo (2015) Analytical solution and scaling of fluctuations in complex networks traversed by damped, interacting random walkers. Phys Rev E Stat Nonlin Soft Matter Phys 92:052803

Alves, Gelio; Yu, Yi-Kuo (2015) Mass spectrometry-based protein identification with accurate statistical significance assignment. Bioinformatics 31:699-706

Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo (2014) Molecular Isotopic Distribution Analysis (MIDAs) with adjustable mass accuracy. J Am Soc Mass Spectrom 25:57-70

Hamaneh, Mehdi Bagheri; Yu, Yi-Kuo (2014) Relating diseases by integrating gene associations and information flow through protein interaction network. PLoS One 9:e110936

Alves, Gelio; Yu, Yi-Kuo (2014) Accuracy evaluation of the unified P-value from combining correlated P-values. PLoS One 9:e91225

Showing the most recent 10 out of 26 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: