Mass spectrometry, the core technology in the field of proteomics, promises to enable scientists to identify and quantify the entire complement of molecules that comprise a complex biological sample. In the biological and health sciences, mass spectrometry is commonly used in a nigh-throughput fashion to identify proteins in a mixture. Currently, the primary bottleneck in this type of experiment is computational. Existing algorithms for interpreting mass spectra are slow and fail to identify a large proportion of the given spectra. We propose to apply techniques and tools from the field of machine learning to the analysis of mass spectrometry data. We will build computational models of peptide fragmentation within the mass spectrometer, as well as larger-scale models of the entire mass spectrometry process. Using these models, we will design and validate algorithms for identifying the set of proteins that best explain an observed set of spectra. Software implementations for all of the methods will be made publicly available in a user-friendly form. In practical terms, this software will enable scientists to more easily, efficiently and accurately analyze and understand their mass spectrometry data. Relevance: The applications of mass spectrometry and its promises for improvements of human health are numerous, including an increased understanding of disease phenotypes and the molecular mechanisms that underlie them, and vastly more sensitive and specific diagnostic and prognostic screens.

National Institute of Health (NIH)
National Institute of Biomedical Imaging and Bioengineering (NIBIB)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Peng, Grace
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Washington
Schools of Medicine
United States
Zip Code
Halloran, John T; Bilmes, Jeff A; Noble, William S (2016) Dynamic Bayesian Network for Accurate Detection of Peptides from Tandem Mass Spectra. J Proteome Res 15:2749-59
Noble, William Stafford (2015) Mass spectrometrists should search only for peptides they care about. Nat Methods 12:605-8
Spivak, Marina; Weston, Jason; Tomazela, Daniela et al. (2012) Direct maximization of protein identifications from tandem mass spectra. Mol Cell Proteomics 11:M111.012161
Serang, Oliver; Noble, William Stafford (2012) Faster mass spectrometry-based protein inference: junction trees are more efficient than sampling and marginalization by enumeration. IEEE/ACM Trans Comput Biol Bioinform 9:809-17
McIlwain, Sean; Mathews, Michael; Bereman, Michael S et al. (2012) Estimating relative abundances of proteins from shotgun proteomics data. BMC Bioinformatics 13:308
Granholm, Viktor; Noble, William Stafford; Käll, Lukas (2011) On using samples of known protein content to assess the statistical calibration of scores assigned to peptide-spectrum matches in shotgun proteomics. J Proteome Res 10:2671-8
Diament, Benjamin J; Noble, William Stafford (2011) Faster SEQUEST searching for peptide identification from tandem mass spectra. J Proteome Res 10:3871-9
Sharma, Vagisha; Eng, Jimmy K; Feldman, Sergey et al. (2010) Precursor charge state prediction for electron transfer dissociation tandem mass spectra. J Proteome Res 9:5438-44
Demir-Kavuk, Ozgur; Riedesel, Henning; Knapp, Ernst-Walter (2010) Exploring classification strategies with the CoEPrA 2006 contest. Bioinformatics 26:603-9
Serang, Oliver; MacCoss, Michael J; Noble, William Stafford (2010) Efficient marginalization to compute protein posterior probabilities from shotgun mass spectrometry data. J Proteome Res 9:5346-57

Showing the most recent 10 out of 22 publications