This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. Primary support for the subproject and the subproject's principal investigator may have been provided by other sources, including other NIH sources. The Total Cost listed for the subproject likely represents the estimated amount of Center infrastructure utilized by the subproject, not direct funding provided by the NCRR grant to the subproject or subproject staff. The problem of identifying proteins from a shotgun proteomics experiment has not been definitively solved. Identifying the proteins in a sample requires ranking them, ideally with interpretable scores. In particular, ?degenerate? peptides, which map to multiple proteins, have made such a ranking difficult to compute. The problem of computing posterior probabilities for the proteins, which can be interpreted as confidence in a protein?s presence, has been especially daunting. Previous approaches have either ignored the peptide degeneracy problem completely, addressed it by computing a heuristic set of proteins or heuristic posterior probabilities, or estimated the posterior probabilities with sampling methods. We present a probabilistic model for protein identification in tandem mass spectrometry that recognizes peptide degeneracy. We then introduce graph-transforming algorithms that facilitate efficient computation of protein probabilities, even for large data sets. We evaluate our identification procedure on five different well-characterized data sets and demonstrate our ability to efficiently compute high-quality protein posteriors.
Showing the most recent 10 out of 583 publications