Although heavily concentrated and studied, the statistical accuracy of peptide/protein identification remains challenging. There are many peptide identification methods using database searches and assigning the E-value to peptide hits, however, the E-values reported by different methods do not agree with each other and few of them, if any, agree with the textbook definition of the E-value. This obviously hinders the feasibility of combining search results from different methods. In particular, if one wishes to combine methods with user-assigned weights. When prior knowledge is available, it is often desirable to weight search methods differently before combining their search results. We have provided a way to combine search results democratically in one of our 2008 publications. When different weights are present, an instability issue occurs if some of the weights are nearly degenerate. In 2011, we have devised a mathematical framework to completely eliminate the possible instability. The past year one of our main effort is to design a protein identification method that combines weighted P-values of evidence peptides. This new method solves the long-standing problem of precise type-I error control in protein identification. In addition, this new method also reports correctly the proportion of false discoveries, indication of accurate type-II error control. The results we have obtained are very encouraging and are recently published in Bioinformatics. We have also made the protein identification function available in our RAId web service in our group website www.ncbi.nlm.nih.gov/CBBresearch/Yu/raid_dbs/index.html The past year we also finished the first phase of a large collaborative project, involving scientists in NHLBI and Clinical Center, in pathogen identifications using mass spectrometry. The fundamental idea is to use each pathogen's peptidome to represent that pathogen. Through the use of mass spectrometry analysis, if the statistical significance assignment is accurate, one will be able to correctly rank the species/genus according to their peptidome simiarilty compared with the peptides identified. Again, we have to weight the evidence peptides associated with a given species/genus as one peptide often maps to multiple species/genus. Our results are very encouraging and we have sent the written results to the Journal of American Society of Mass Spectrometry for consideration for publication. Since different diseases might share similar cause while similar diseases may actually come from different origins, we believe it is important to characterize disease relationship from a different perspective, the perspective of protein-protein interaction network. With this in mind, we have downloaded all diseases along with their associated genes stored in the Comparative Toxicogenomics Database (CTD) and analyze the disease-disease relations based on the similarity between their protein weight vectors, each obtained by using the disease genes as the sources and sinks in the interaction network containing proteins with documented interactions. We found some very interesting results that complement the disease relations found via phenotypic annotation. These results are recently published in PLoS One. We have also implemented a web service DeCoaD that allows users to look for similar diseases to the input based on interaction networks. The web link is www.ncbi.nlm.nih.gov/CBBresearch/Yu/mn/DeCoaD/index.html and the details regarding the implementation are recently published in BMC Research Note.

Project Start
Project End
Budget Start
Budget End
Support Year
12
Fiscal Year
2015
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
Zip Code
Joyce, Brendan; Lee, Danny; Rubio, Alex et al. (2018) A graphical user interface for RAId, a knowledge integrated proteomics analysis suite with accurate statistics. BMC Res Notes 11:182
Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y et al. (2018) Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry. J Am Soc Mass Spectrom 29:1721-1737
Alves, Gelio; Yu, Yi-Kuo (2016) Confidence assignment for mass spectrometry based peptide identifications via the extreme value distribution. Bioinformatics 32:2642-9
Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y et al. (2016) Identification of Microorganisms by High Resolution Tandem Mass Spectrometry with Accurate Statistical Significance. J Am Soc Mass Spectrom 27:194-210
Hamaneh, Mehdi B; Yu, Yi-Kuo (2015) DeCoaD: determining correlations among diseases using protein interaction networks. BMC Res Notes 8:226
Hamaneh, Mehdi Bagheri; Haber, Jonah; Yu, Yi-Kuo (2015) Analytical solution and scaling of fluctuations in complex networks traversed by damped, interacting random walkers. Phys Rev E Stat Nonlin Soft Matter Phys 92:052803
Alves, Gelio; Yu, Yi-Kuo (2015) Mass spectrometry-based protein identification with accurate statistical significance assignment. Bioinformatics 31:699-706
Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo (2014) Molecular Isotopic Distribution Analysis (MIDAs) with adjustable mass accuracy. J Am Soc Mass Spectrom 25:57-70
Hamaneh, Mehdi Bagheri; Yu, Yi-Kuo (2014) Relating diseases by integrating gene associations and information flow through protein interaction network. PLoS One 9:e110936
Alves, Gelio; Yu, Yi-Kuo (2014) Accuracy evaluation of the unified P-value from combining correlated P-values. PLoS One 9:e91225

Showing the most recent 10 out of 26 publications