Proteomic research is among the most important ones in the post genomic era. Recent advances in tandem mass spectrometry(MS/MS) made promising the protein identification at large scale. In a MS/MS setup, a parent peptide ion's mass over charge ratio is first measured. The parent ion is then bombarded by noble gas and the product ions (charged fragments of the peptide) are then collected by analyzer and the mass over charge ratios of all fragment ions compose a MS/MS spectrum for the parent ion selected. There are in general two approaches to identify peptides from tandem mass spectrometry data: one is the library search method and the other is the de novo method. The peptide library usually come from theoretically digesting proteins in a certain protein database. Scoring the query spectrum with respect to spectra from theoretically fragmented peptides of database is the key step in library search methods of peptide identifications. The de novo methods, on the other hand, attempt to derive possible peptide without prior knowledge base such as a peptide database. The de novo method faces the difficulty of combinatorial increase of possible peptides with respect to peptide molecular weight. The library search method, on the other hand, needs significant improvement in terms of statistical characterization. Basically, the reliability of peptide identifications from analysis tools remains controversial. For example, it has been shown that using the default search parameters can result in a large percentage of misidentified proteins. Very high scoring thresholds, on the other hand, lead to very low error rates but discard most true peptide hits. Although progress has been made in terms of estimating the false-positive and false-negative rates in any given data set, this does not increase the confidence in the identification of any given peptide. The PI therefore propose to increase the confidence in identifying individual peptide by using spectrum-specificdatabase-independent statistics that is produced by de novo methods. We have been developing a computational tool, RAId (acronym for Robust Accurate Identification), to identify peptides from their associated tandem mass spectrometry data. RAId performs a novel de novo sequencing followed by a search in a peptide library that we created. Because the noise in a spectrum depends on experimental conditions, the instrument used, and many other factors, it can't be predicted even if the peptide sequence is known. The characteristics of the noise can only be uncovered once a spectrum is given. Through our de novo sequencing, we obtain the spectrum-specific database-independent background score statistics for the library search. When the database search fails to return significant hits, the top-ranking de novo sequences naturally become candidates for new peptides that are not yet in the database. Unlike typical library searches, our database search has no constraint on the number of mis-cleavages. For each spectrum, RAId reports library hits along with top-ranking de novo sequences that do not have library hits. The most remarkable feature of RAId is its use of spectrum-specific database-independent background statistics which enables it to perform well even when the spectrum quality is marginal. Other important features of RAId include its potential in de novo sequencing alone and the ease of incorporating post-translational modifications. We are currently writing up a manuscript for RAId and several other related manuscripts.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM092404-01
Application #
6988474
Study Section
(CBB)
Project Start
Project End
Budget Start
Budget End
Support Year
1
Fiscal Year
2004
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Alves, Gelio; Yu, Yi-Kuo (2008) Statistical Characterization of a 1D Random Potential Problem - with applications in score statistics of MS-based peptide sequencing. Physica A 387:6538-6544
Alves, Gelio; Wu, Wells W; Wang, Guanghui et al. (2008) Enhancing peptide identification confidence by combining search methods. J Proteome Res 7:3102-13
Alves, Gelio; Ogurtsov, Aleksey Y; Kwok, Siwei et al. (2008) Detection of co-eluted peptides using database search methods. Biol Direct 3:27
Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo (2007) RAId_DbS: peptide identification using database searches with realistic statistics. Biol Direct 2:25
Alves, Gelio; Ogurtsov, Aleksey Y; Wu, Wells W et al. (2007) Calibrating E-values for MS2 database search methods. Biol Direct 2:26
Alves, Gelio; Yu, Yi-Kuo (2005) Robust accurate identification of peptides (RAId): deciphering MS2 data using a structured library search with de novo based statistics. Bioinformatics 21:3726-32
Souza, J A; Yu, Yi-Kuo; Neumeier, J J et al. (2005) Method for analyzing second-order phase transitions: application to the ferromagnetic transition of a polaronic system. Phys Rev Lett 94:207209
Yu, Yi-Kuo; Altschul, Stephen F (2005) The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions. Bioinformatics 21:902-11
Przytycka, Teresa M; Yu, Yi-Kuo (2004) Scale-free networks versus evolutionary drift. Comput Biol Chem 28:257-64