Mass spectrometry, the core technology in the field of proteomics, promises to enable scientists to identify and quantify the entire complement of molecules that comprise a complex biological sample. In the biological and health sciences, mass spectrometry is commonly used in a nigh-throughput fashion to identify proteins in a mixture. Currently, the primary bottleneck in this type of experiment is computational. Existing algorithms for interpreting mass spectra are slow and fail to identify a large proportion of the given spectra. We propose to apply techniques and tools from the field of machine learning to the analysis of mass spectrometry data. We will build computational models of peptide fragmentation within the mass spectrometer, as well as larger-scale models of the entire mass spectrometry process. Using these models, we will design and validate algorithms for identifying the set of proteins that best explain an observed set of spectra. Software implementations for all of the methods will be made publicly available in a user-friendly form. In practical terms, this software will enable scientists to more easily, efficiently and accurately analyze and understand their mass spectrometry data. Relevance: The applications of mass spectrometry and its promises for improvements of human health are numerous, including an increased understanding of disease phenotypes and the molecular mechanisms that underlie them, and vastly more sensitive and specific diagnostic and prognostic screens.
Showing the most recent 10 out of 22 publications