Although heavily concentrated and studied, the statistical accuracy of peptide identification remains challenging. Although there are many peptide identification methods using database searches and assigning the E-value to peptide hits, the E-values reported by different methods do not agree with each other and none of them agree with the textbook definition of the E-value. For the past year, one of our major efforts is to develop statistical approach to properly take into account during data analysis the proteotypic peptides, that is, peptides that are consistently observed in mass spectrometry based proteomics experiments. We have illustrated that the proteotypic information does help retrieval performance provided that it is incorporated into the database with sufficient quality control. We have submitted our results to Journal of Proteomics for consideration of publication. Another direction that we embark on is to utilize the score statistics of all possible peptides in various applications. In 2008, we have shown the possibility of scoring trillions of trillions of peptides to form the score histogram of all possible peptides for a given MS spectrum and an additive scoring function. In the past year, we have turned this somewhat theoretical result into pragmatic use by re-expressing several well-known scoring functions in the field of computational proteomics into additive forms and thus obtain the unified score statistics for those scoring functions. A main difficulty we need to circumvent is to learn about how each scoring function pre-process the query spectrum. This critical step largely determines the final score of each candidate peptide. In order to achieve this task, we need to dig into other analysis programs to extract their heuristic filtering rules. After accomplishing this daunting task, we have successfully built an application tool that allows for (1) combining search results using all-possible-peptide score statistics and (2) reassignment of E-values. Our new application, RAId_aPS, is now available on our group website and the results are written and submitted to PLoS One for consideration of publication.
Showing the most recent 10 out of 26 publications