Proteins are the primary functional molecules in living cells, and tandem mass spectrometry provides the most ef?cient means of studying proteins in a high-throughput fashion. The proposal aims to use state-of-the-art methods from the ?elds of machine learning, statistics, and natural language processing to improve our ability to make sense of large tandem mass spectrometry data sets. Our project will focus on three key problems in the analysis of such data: 1. facilitating the use of previously annotated spectra to improve our ability to annotate new spectra by creating a hybrid search scheme that compares an observed spectrum to a database comprised of theoretical spectra and previously annotated spectra, 2. enabling the ef?cient and accurate detection of peptides containing post-translational modi?cations and sequence variants, and 3. detecting sets of peptide species that are co-fragmented in the mass spectrometer and hence give rise to complex, mixture spectra. Each of these aims will improve the ability of mass spectrometrists to ef?ciently and accurately identify and quantify proteins in complex mixtures. To increase the impact of our work, we will continue to make all of our tools available as free software.

Public Health Relevance

The applications of mass spectrometry, and its promises for improvements of human health, are numerous, including an increased understanding of disease phenotypes and the molecular mechanisms that underlie them, and vastly more sensitive and speci?c diagnostic and prognostic screens. However, making optimal use of mass spectrometry data requires sophisticated computational methods. This project will develop and apply novel statistical and machine learning methods for interpreting mass spectra.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Krepkiy, Dmitriy
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Washington
Schools of Medicine
United States
Zip Code
Keich, Uri; Noble, William Stafford (2018) Controlling the FDR in imperfect matches to an incomplete database. J Am Stat Assoc 113:973-982
Lin, Andy; Howbert, J Jeffry; Noble, William Stafford (2018) Combining High-Resolution and Exact Calibration To Boost Statistical Power: A Well-Calibrated Score Function for High-Resolution MS2 Data. J Proteome Res 17:3644-3656
Hu, Alex; Lu, Yang Young; Bilmes, Jeff et al. (2018) Joint Precursor Elution Profile Inference via Regression for Peptide Detection in Data-Independent Acquisition Mass Spectra. J Proteome Res :
Bittremieux, Wout; Meysman, Pieter; Noble, William Stafford et al. (2018) Fast Open Modification Spectral Library Searching through Approximate Nearest Neighbor Indexing. J Proteome Res 17:3463-3474
Ting, Ying S; Egertson, Jarrett D; Bollinger, James G et al. (2017) PECAN: library-free peptide detection for data-independent acquisition tandem mass spectrometry data. Nat Methods 14:903-908
Noble, William Stafford; Keich, Uri (2017) Response to ""Mass spectrometrists should search for all peptides, but assess only the ones they care about"". Nat Methods 14:644
Keich, Uri; Noble, William Stafford (2017) Progressive calibration and averaging for tandem mass spectrometry statistical confidence estimation: Why settle for a single decoy? Res Comput Mol Biol 10229:99-116
Sakano, Hitomi; Zorio, Diego A R; Wang, Xiaoyu et al. (2017) Proteomic analyses of nucleus laminaris identified candidate targets of the fragile X mental retardation protein. J Comp Neurol 525:3341-3359
May, Damon H; Tamura, Kaipo; Noble, William S (2017) Param-Medic: A Tool for Improving MS/MS Database Search Yield by Optimizing Parameter Settings. J Proteome Res 16:1817-1824