Although heavily concentrated and studied, the statistical accuracy of peptide identification remains challenging. There are many peptide identification methods using database searches and assigning the E-value to peptide hits, however, the E-values reported by different methods do not agree with each other and none of them agree with the textbook definition of the E-value. This obviously hinders the feasibility of combining search results from different methods. In particular, if one wishes to combine methods with user-assigned weights. For the past year, one of our major efforts is to develop statistical approach to properly take into account during data analysis the proteotypic peptides, that is, peptides that are consistently observed in mass spectrometry based proteomics experiments. We have illustrated that the proteotypic information does help retrieval performance provided that it is incorporated into the database with sufficient quality control. We have published our results in Journal of Proteomics (doi:10.1016/j.jprot.2010.10.005). Another direction that we embark on is to utilize the score statistics of all possible peptides in various applications. In 2008, we have shown the possibility of scoring trillions of trillions of peptides to form the score histogram of all possible peptides for a given MS spectrum and an additive scoring function. In the past year, we have turned this somewhat theoretical result into pragmatic use by re-expressing several well-known scoring functions in the field of computational proteomics into additive forms and thus obtain the unified score statistics for those scoring functions. A main difficulty we need to circumvent is to learn about how each scoring function pre-process the query spectrum. This critical step largely determines the final score of each candidate peptide. In order to achieve this task, we need to dig into other analysis programs to extract their heuristic filtering rules. After accomplishing this daunting task, we have successfully built an application tool that allows for (1) combining search results using all-possible-peptide score statistics and (2) reassignment of E-values. Our new application, RAId_aPS, is now available on our group website and the results are written and published in PLoS One (doi:10.1371/journal.pone.0015438). When prior knowledge is available, it is often desirable to weight search methods differently before combining their search results. We have provided a way to combine search results democratically in one of our 2008 publications. When different weights are present, an instability issue occurs if some of the weights are nearly degenerate. In the past year, we have devised a mathematical framework to completely eliminate the possible instability. This work is recently published in PLoS One ( doi:10.1371/journal.pone.0022647).
Showing the most recent 10 out of 26 publications