Proteomic research is among the most important ones in the post genomic era. Recent advances in tandem mass spectrometry(MS/MS) made promising the protein identification at large scale. The key to mass-spectrometry-based proteomics is peptide sequencing. There are in general two approaches to identify peptides from tandem mass spectrometry data: one is the library search method and the other is the de novo method. The major challenge in peptide sequencing, whether library search or de novo, is to better interpret statistical significance. Employing the scaling theory from statistical physics, we have developed a systematic method to address the issue of statistical significance assignment. A heuristic version of this statistical assignment is currently implemented in RAId, a coherent method developed by us to identify peptides from their associated tandem mass spectrometry data. RAId performs a novel de novo sequencing followed by a search in a peptide library that we created. Because the noise in a spectrum depends on experimental conditions, the instrument used, and many other factors, it cannot be predicted even if the peptide sequence is known. The characteristics of the noise can only be uncovered once a spectrum is given. Through our de novo sequencing, we obtain the spectrum-specific background score statistics for the library search. When the database search fails to return significant hits, the top-ranking denovo sequences become candidates for new peptides that are not yet in the database. To ensure broadest coverage, unlike typical library searches, our database search has no constraint on the number of mis-cleavages. For each spectrum, RAId reports library hits along with top-ranking de novo sequences that do not have library hits. The most remarkable feature of RAId is its use of the concept of spectrumspecific background statistics which enables it to perform well even when the spectral quality is marginal. Other important features of RAId include its potential in de novo sequencing alone and the ease of incorporating post-translational modifications.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM092404-02
Application #
7148171
Study Section
(CBB)
Project Start
Project End
Budget Start
Budget End
Support Year
2
Fiscal Year
2005
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Alves, Gelio; Yu, Yi-Kuo (2008) Statistical Characterization of a 1D Random Potential Problem - with applications in score statistics of MS-based peptide sequencing. Physica A 387:6538-6544
Alves, Gelio; Wu, Wells W; Wang, Guanghui et al. (2008) Enhancing peptide identification confidence by combining search methods. J Proteome Res 7:3102-13
Alves, Gelio; Ogurtsov, Aleksey Y; Kwok, Siwei et al. (2008) Detection of co-eluted peptides using database search methods. Biol Direct 3:27
Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo (2007) RAId_DbS: peptide identification using database searches with realistic statistics. Biol Direct 2:25
Alves, Gelio; Ogurtsov, Aleksey Y; Wu, Wells W et al. (2007) Calibrating E-values for MS2 database search methods. Biol Direct 2:26
Alves, Gelio; Yu, Yi-Kuo (2005) Robust accurate identification of peptides (RAId): deciphering MS2 data using a structured library search with de novo based statistics. Bioinformatics 21:3726-32
Souza, J A; Yu, Yi-Kuo; Neumeier, J J et al. (2005) Method for analyzing second-order phase transitions: application to the ferromagnetic transition of a polaronic system. Phys Rev Lett 94:207209
Yu, Yi-Kuo; Altschul, Stephen F (2005) The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions. Bioinformatics 21:902-11
Przytycka, Teresa M; Yu, Yi-Kuo (2004) Scale-free networks versus evolutionary drift. Comput Biol Chem 28:257-64