Proteomic research is among the most important ones in the post genomic era. ? Recent advances in tandem mass spectrometry(MS/MS) made promising the ? protein identification at large scale. The key to mass-spectrometry-based proteomics is ? peptide sequencing. There are in general two approaches to identify ? peptides from tandem mass spectrometry data: one is the library search method and ? the other is the de novo method. The major challenge in? peptide sequencing, whether library search or de novo, is to better interpret statistical significance.? ? Employing the scaling theory from statistical physics, we have developed a systematic ? method to address the issue of statistical significance assignment. A heuristic version? of this statistical assignment is currently implemented in RAId, a coherent method ? developed by us to identify peptides from their associated tandem mass? spectrometry data. RAId performs a novel de novo sequencing followed by a search in a peptide? library that we created. Because the noise in a spectrum depends on experimental conditions,? the instrument used, and many other factors, it cannot be predicted even if the peptide sequence? is known. The characteristics of the noise can only be uncovered once a spectrum is given.? Through our de novo sequencing, we obtain the spectrum-specific background score statistics for? the library search. When the database search fails to return significant hits, the top-ranking de? novo sequences become candidates for new peptides that are not yet in the database. ? ? Although RAId has been shown to perform quite well when high-resolution spectra are used, ? it is not yet to our satisfaction in terms of its performance in dealing with low-resolution data.? For the past year, it has been our goal to enable RAId to deal with such cases. We have developed? an efficient algorithm to score all possible 4-letter tags covering both terminus of the peptide.? For low resolution data, we currently implement the strategy of generating only de novo tags? instead of the full sequence. This turns out to be very effective. We are currently investigating ? the possibility of incorporating post-translational modifications.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM092404-03
Application #
7316282
Study Section
(CBB)
Project Start
Project End
Budget Start
Budget End
Support Year
3
Fiscal Year
2006
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Alves, Gelio; Yu, Yi-Kuo (2008) Statistical Characterization of a 1D Random Potential Problem - with applications in score statistics of MS-based peptide sequencing. Physica A 387:6538-6544
Alves, Gelio; Wu, Wells W; Wang, Guanghui et al. (2008) Enhancing peptide identification confidence by combining search methods. J Proteome Res 7:3102-13
Alves, Gelio; Ogurtsov, Aleksey Y; Kwok, Siwei et al. (2008) Detection of co-eluted peptides using database search methods. Biol Direct 3:27
Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo (2007) RAId_DbS: peptide identification using database searches with realistic statistics. Biol Direct 2:25
Alves, Gelio; Ogurtsov, Aleksey Y; Wu, Wells W et al. (2007) Calibrating E-values for MS2 database search methods. Biol Direct 2:26
Alves, Gelio; Yu, Yi-Kuo (2005) Robust accurate identification of peptides (RAId): deciphering MS2 data using a structured library search with de novo based statistics. Bioinformatics 21:3726-32
Souza, J A; Yu, Yi-Kuo; Neumeier, J J et al. (2005) Method for analyzing second-order phase transitions: application to the ferromagnetic transition of a polaronic system. Phys Rev Lett 94:207209
Yu, Yi-Kuo; Altschul, Stephen F (2005) The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions. Bioinformatics 21:902-11
Przytycka, Teresa M; Yu, Yi-Kuo (2004) Scale-free networks versus evolutionary drift. Comput Biol Chem 28:257-64