The last decade has witnessed rapid advances in mass spectrometry (MS) technology. In particular, the liquid chromatograph coupled tandem mass spectrometry (LC-MS/MS) has become a popular analytical tool for characterizing complex protein samples in many branches of life sciences, including microbiology, environmental science, plant biology, agriculture and biomedicine. This project aims to exploit publicly available proteomic data for predicting tandem mass (MS/MS) spectra of peptides. Successful prediction of peptide MS/MS spectra is of great theoretical interests (for better understanding mechanisms of peptide fragmentation in mass spectrometers), and will significantly improve the peptide identification, which is critical for the analyses of complex protein samples. The PIs of this project are actively involved and lead some the school and departments outreach activities and events, including the annual summer camp for girl scouts. They plan to recruit students from HBCU institutes to participate summer research each year in this project.
The PIs of the project propose to a sequence-to-sequence (seq2seq) deep learning model for predicting the full MS/MS spectra of peptides directly from their sequences without any assumption of fragmentation rules. They will also exploit the multitask learning (MTL) approach for predicting the MS/MS spectra from peptides containing post-translation modification (PTMs), and the MS/MS spectra acquired by using different ion activation methods, such as Electron Transfer Dissociation (ETDs). The deep learning models will be implemented and released in open source software tools to be used by the research community. The update of the research project will be made available through the project website: www.predfull.com.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.