Although heavily concentrated and studied, the statistical accuracy of peptide identification remains challenging. For example, there are many ? peptide identification methods using database searches and assigning the E-value to peptide hits. Unfortunately, the E-values reported among different methods do not agree with each other and none of them agree with the textbook definition of the E-value. ? ? ? Last year, we developed a new database search method for peptide identification, RAId_DbS, that is able to provide more accurate E-value (or statistical significance) than other existing methods. In addition, it is also shown that in terms of information retrieval efficiency, RAId_DbS is at least comparable to or better than best existing methods. We also developed a new protocol to calibrate the statistics for any database search method. This protocol allows the user to transform the score or E-value reported by a certain search method into a standardized E-value that is derived from the fundamental definition of E-value. As a consequence, this protocol enables comparison between results obtained from different search methods, analyzed by different laboratories etc.? ? Often, different search methods report different results given the same query spectrum and the same database to search. It is advantageous to be able to properly combine different search methods to render better performance. This year, we proposed a protocol for properly combining search methods and showed its effectiveness in improving retrieval accuracy. ? ? Peptide co-elution is another challenge that hinders the promise of large scale Mass Spectrometry based peptide identification. Most search methods are designed with one true peptide per spectrum in mind. However, due to the limitation in chromatographic separations, we showed that it is inevitable to have peptide co-elution in a large fraction of spectra. We have also performed a study on how well the current search methods may deal with spectra containing multiple co-eluted peptides in preparation for a new endeavor to tackle this important issue.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM092404-05
Application #
7735085
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
5
Fiscal Year
2008
Total Cost
$293,131
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Alves, Gelio; Yu, Yi-Kuo (2008) Statistical Characterization of a 1D Random Potential Problem - with applications in score statistics of MS-based peptide sequencing. Physica A 387:6538-6544
Alves, Gelio; Wu, Wells W; Wang, Guanghui et al. (2008) Enhancing peptide identification confidence by combining search methods. J Proteome Res 7:3102-13
Alves, Gelio; Ogurtsov, Aleksey Y; Kwok, Siwei et al. (2008) Detection of co-eluted peptides using database search methods. Biol Direct 3:27
Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo (2007) RAId_DbS: peptide identification using database searches with realistic statistics. Biol Direct 2:25
Alves, Gelio; Ogurtsov, Aleksey Y; Wu, Wells W et al. (2007) Calibrating E-values for MS2 database search methods. Biol Direct 2:26
Alves, Gelio; Yu, Yi-Kuo (2005) Robust accurate identification of peptides (RAId): deciphering MS2 data using a structured library search with de novo based statistics. Bioinformatics 21:3726-32
Souza, J A; Yu, Yi-Kuo; Neumeier, J J et al. (2005) Method for analyzing second-order phase transitions: application to the ferromagnetic transition of a polaronic system. Phys Rev Lett 94:207209
Yu, Yi-Kuo; Altschul, Stephen F (2005) The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions. Bioinformatics 21:902-11
Przytycka, Teresa M; Yu, Yi-Kuo (2004) Scale-free networks versus evolutionary drift. Comput Biol Chem 28:257-64