This is a revised competing continuation application for a project that is currently in its eighth year of funding. The long term goal of the proposed research is to improve automated peptide sequencing and protein identification by tandem mass spectrometry. The underlying hypothesis of this research is that the computer algorithms that are used for peptide sequencing and protein identification can be improved by determining how peptides fragment at a molecular level. The proposed research has three specific aims: analysis of a large database of peptide dissociation spectra, the use of model systems to develop a detailed understanding of unusual fragmentation patterns such as suppressed cleavage C-terminal to Ser and Thr, and application of the results of the statistical analysis to develop a probability-based sequencing algorithm. The research is expected to have immediate and longer term impacts on practical peptide sequencing and protein identification by MS/MS and to improve the fundamental knowledge of how peptide structure influences peptide dissociation. The main goal of Specific Aim I is to execute a statistical analysis of the peptide dissociation patterns contained in a large database (>30,000) of ion trap tandem mass spectra for which correct sequences are known. The results of the analyses (relative abundances of bond cleavages) will be used to determine chemical interactions or residue combinations involved in promoting specific cleavage pathways and to improve general peptide fragmentation models. The goal of Specific Aim II is to use a systematic gas-phase dissociation mechanism approach, as performed successfully by the Wysocki research group in the past, to examine in more detail unusual fragmentation behavior identified in Specific Aim I. Based on preliminary results, the first problem to be addressed is why cleavage at Ser and Thr is enhanced at their N-terminal amide bond and suppressed at their C-terminal amide bond. The goal of Specific Aim Ill is to apply the results of the statistical analyses in Specific Aim I to develop a new peptide/protein identification algorithm called SQID (SeQuence IDentification). This database search algorithm differs from others currently available in that the statistics of fragmentation of large numbers of actual spectra provide """"""""truth"""""""" sets that are used to define expected fragmentation behavior. Several truth sets, distinguished by key structural features of the peptide sequences whose spectra were used to compile the set, provide the foundation for the probability-based scoring routine. Histograms of relative cleavage probability for pairwise Xxx-Zzz cleavages between adjacent amino acids of the peptides in a given """"""""truth"""""""" data set will be computed. For each spectrum of an unknown, candidate sequences are selected from a protein sequence database by mass (similar to many current comparative algorithms). The appropriate truth set of histograms, chosen to match a dominant structural motif of the candidate sequence, will be used to calculate the probability that a given candidate sequence is the actual sequence of the unknown peptide. This work exploits a successful collaboration established in the previous funding cycle with a local scientific programmer/algorithm designer, Dr. Joe Triscari.
Showing the most recent 10 out of 44 publications