This subproject is one of many research subprojects utilizing theresources provided by a Center grant funded by NIH/NCRR. The subproject andinvestigator (PI) may have received primary funding from another NIH source,and thus could be represented in other CRISP entries. The institution listed isfor the Center, which is not necessarily the institution for the investigator.Examining LC-MS/MS raw data of complex peptide mixtures (e.g., those found in major histocompatibility complexes or in protein digests encountered in proteomics) shows that only a fraction of the spectra are of sufficient intensity to give good quality product ion spectra. Moreover many of the 'good quality' spectra can be of non-peptidic contaminates and will not yield sequence information. If all the extraneous spectra could be identified and eliminated from the search beforehand, a much more efficient and fast search could be preformed. Analyzing MS/MS statistical data using multivariate methods and including chemically relevant classifiers can improve the discrimination. A new computer program was written using PERL. The program extracts statistical information and tentative partial sequences from each raw spectrum. The data is evaluated to calculate a statistical model using Partial Least Squeres method. The correlation between the predicted score to the actual one is sufficient to eliminate a large percentage of the 'noise' spectra. Since the rate of modifications in mamalian proteom is estimated to be about 3 PTM per protein and only spectra with a peptidic signature will be retained, the remaining unidentified spectra will be good candidates for correlation based PTM search program.
Showing the most recent 10 out of 696 publications