Although heavily concentrated and studied, the statistical accuracy of peptide/protein identification remains challenging. There are many peptide identification methods using database searches and assigning the E-value to peptide hits, however, the E-values reported by different methods do not agree with each other and few of them, if any, agree with the textbook definition of the E-value. This obviously hinders the feasibility of combining search results from different methods. In particular, if one wishes to combine methods with user-assigned weights. When prior knowledge is available, it is often desirable to weight search methods differently before combining their search results. We have provided a way to combine search results democratically in one of our 2008 publications. When different weights are present, an instability issue occurs if some of the weights are nearly degenerate. In 2011, we have devised a mathematical framework to completely eliminate the possible instability. We have recently expanded this idea to incorporate the case when the methods being considered are correlated. In our first step towards this direction, we assessed the accuracy of combined P-values upon using various published methods. The results are recently published in PLoS One. The past year we also finished the project of constructing a fast and accurate algorithm for computing the molecular mass isotopic distribution given the elemental compositions. In comparison to the existing algorithms/methods, our method provides both coarse and fine resolution of the spectrum as opposed to only coarse or fine spectrum provided. In addition, under several testing criteria, our method seems to be among the best performing ones. We have made a web service available in our group website www.ncbi.nlm.nih.gov/CBBresearch/Yu/midas/index.html The method part is also written as a paper recently published in the Journal of American Society of Mass Spectrometry. Since most protein association (complex) data are obtained from pull down experiment analyzed using mass spectrometers to identify co-pull-down partner proteins, our efforts in proteomics also include investigation of protein-protein interaction network as well as association data organization. Our investigation on these problems has led to development of a few tools that are suitable for network exploration, hypotheses forming and a new method of organizing protein association to a directed acyclic graph which alleviates the effect of non-specific bindings and false positive associations as well as false negatives. The first set of results were written and recently published in PLoS One. For the past year, one of our major efforts is to understand physical mechanisms for peptide fragmentation so as to predict the possibility of observing a certain peak. If this works, it can significantly improve peptide scoring by providing a peptide-specific peak filtering. The results inidicate that the dissociation energy can be a good indicator for the observability of certain peaks, at least for singly charged short peptide with non-polar side-chains. Since most peptides retain more than one positive charges in MS experiments, we are currently expanding our investigation to peptides with two positive charges included. We have also worked on a protein identification method that combines weighted P-values of evidence peptides. This new method aims to solve the long-standing problem of precise type-I error control in protein identification. The results we have obtained are very encouraging, indicating the possibility of a precise control of type-I error.

Project Start
Project End
Budget Start
Budget End
Support Year
11
Fiscal Year
2014
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
Zip Code
Joyce, Brendan; Lee, Danny; Rubio, Alex et al. (2018) A graphical user interface for RAId, a knowledge integrated proteomics analysis suite with accurate statistics. BMC Res Notes 11:182
Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y et al. (2018) Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry. J Am Soc Mass Spectrom 29:1721-1737
Alves, Gelio; Yu, Yi-Kuo (2016) Confidence assignment for mass spectrometry based peptide identifications via the extreme value distribution. Bioinformatics 32:2642-9
Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y et al. (2016) Identification of Microorganisms by High Resolution Tandem Mass Spectrometry with Accurate Statistical Significance. J Am Soc Mass Spectrom 27:194-210
Hamaneh, Mehdi B; Yu, Yi-Kuo (2015) DeCoaD: determining correlations among diseases using protein interaction networks. BMC Res Notes 8:226
Hamaneh, Mehdi Bagheri; Haber, Jonah; Yu, Yi-Kuo (2015) Analytical solution and scaling of fluctuations in complex networks traversed by damped, interacting random walkers. Phys Rev E Stat Nonlin Soft Matter Phys 92:052803
Alves, Gelio; Yu, Yi-Kuo (2015) Mass spectrometry-based protein identification with accurate statistical significance assignment. Bioinformatics 31:699-706
Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo (2014) Molecular Isotopic Distribution Analysis (MIDAs) with adjustable mass accuracy. J Am Soc Mass Spectrom 25:57-70
Hamaneh, Mehdi Bagheri; Yu, Yi-Kuo (2014) Relating diseases by integrating gene associations and information flow through protein interaction network. PLoS One 9:e110936
Alves, Gelio; Yu, Yi-Kuo (2014) Accuracy evaluation of the unified P-value from combining correlated P-values. PLoS One 9:e91225

Showing the most recent 10 out of 26 publications