Although heavily concentrated and studied, the statistical accuracy of peptide identification remains challenging. There are many peptide identification methods using database searches and assigning the E-value to peptide hits, however, the E-values reported by different methods do not agree with each other and few of them, if any, agree with the textbook definition of the E-value. This obviously hinders the feasibility of combining search results from different methods. In particular, if one wishes to combine methods with user-assigned weights. When prior knowledge is available, it is often desirable to weight search methods differently before combining their search results. We have provided a way to combine search results democratically in one of our 2008 publications. When different weights are present, an instability issue occurs if some of the weights are nearly degenerate. In 2011, we have devised a mathematical framework to completely eliminate the possible instability. We have recently expanded this idea to incorporate the case when the methods being considered are correlated. In our first step towards this direction, we assessed the accuracy of combined P-values upon using various published methods. The results are under preparation to be submitted to PLoS One. The past year we also finished the project of constructing a fast and accurate algorithm for computing the molecular mass isotopic distribution given the elemental compositions. In comparison to the existing algorithms/methods, our method provides both coarse and fine resolution of the spectrum as opposed to only coarse or fine spectrum provided. In addition, under several testing criteria, our method seems to be among the best performing ones. We have made a web service available in our group website The method part is also written as a paper recently accepted by the Journal of American Society of Mass Spectrometry. To better delineate the difference between our statistical approach (stratified statistics) and other methods, we have also written a manuscript describing how the stratification of search space can lead to better retrieval performance. This paper is published in the Journal of Proteome Research. Since most protein association (complex) data are obtained from pull down experiment analyzed using mass spectrometers to identify co-pull-down partner proteins, our efforts in proteomics also include investigation of protein-protein interaction network as well as association data organization. Our investigation on these problems has led to development of a few tools that are suitable for network exploration, hypotheses forming and a new method of organizing protein association to a directed acyclic graph which alleviates the effect of non-specific bindings and false positive associations as well as false negatives. We are currently preparing a manuscript to be submitted to PLoS Computational Biology. For the past year, one of our major efforts is to understand physical mechanisms for peptide fragmentation so as to predict the possibility of observing a certain peak. If this works, it can significantly improve peptide scoring by providing a peptide-specific peak filtering. The results inidicate that the dissociation energy can be a good indicator for the observability of certain peaks, at least for short peptide with non-polar side-chains. The manuscript was recently published in the Rapid Communications in Mass Spectrometry.

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
National Library of Medicine
Zip Code
Alves, Gelio; Yu, Yi-Kuo (2016) Confidence assignment for mass spectrometry based peptide identifications via the extreme value distribution. Bioinformatics 32:2642-9
Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y et al. (2016) Identification of Microorganisms by High Resolution Tandem Mass Spectrometry with Accurate Statistical Significance. J Am Soc Mass Spectrom 27:194-210
Hamaneh, Mehdi Bagheri; Haber, Jonah; Yu, Yi-Kuo (2015) Analytical solution and scaling of fluctuations in complex networks traversed by damped, interacting random walkers. Phys Rev E Stat Nonlin Soft Matter Phys 92:052803
Alves, Gelio; Yu, Yi-Kuo (2015) Mass spectrometry-based protein identification with accurate statistical significance assignment. Bioinformatics 31:699-706
Hamaneh, Mehdi B; Yu, Yi-Kuo (2015) DeCoaD: determining correlations among diseases using protein interaction networks. BMC Res Notes 8:226
Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo (2014) Molecular Isotopic Distribution Analysis (MIDAs) with adjustable mass accuracy. J Am Soc Mass Spectrom 25:57-70
Alves, Gelio; Yu, Yi-Kuo (2014) Accuracy evaluation of the unified P-value from combining correlated P-values. PLoS One 9:e91225
Stojmirovic, Aleksandar; Yu, Yi-Kuo (2014) Building a hierarchical organization of protein complexes out of protein association data. PLoS One 9:e100098
Hamaneh, Mehdi Bagheri; Yu, Yi-Kuo (2014) Relating diseases by integrating gene associations and information flow through protein interaction network. PLoS One 9:e110936
Obolensky, O I; Wu, Wells W; Shen, Rong-Fong et al. (2013) Using dissociation energies to predict observability of b- and y-peaks in mass spectra of short peptides. II. Results for hexapeptides with non-polar side chains. Rapid Commun Mass Spectrom 27:152-6

Showing the most recent 10 out of 18 publications