Mass spectrometry-based top-down proteomics has become one of the most informative approaches in protein analysis because it provides the bird's-eye view of intact proteoforms (protein forms) generated from post-translational modifications and sequence variations. Data dependent acquisition and data independent acquisition are the two main methods in top-down mass spectrometry. The former has been the dominant one, but it has two main challenges in proteome-wide studies: low protein coverage: a regular experiment of human cells can identify only 200 ? 400 proteins, and low reproducibility: a technical triplet shares only about one third of identified proteoforms. Top-down data independent acquisition mass spectrometry (TD-DIA-MS) has the potential to significantly increase protein coverage and improve reproducibility in proteome-wide studies. However, its application has been hampered by the complexity of the data and the lack of efficient software tools. To address this problem, we will propose new algorithms and machine learning models and develop the first software package for proteoform identification by TD-DIA-MS. The proposed research will be conducted by a group of researchers with complementary expertise. All the proposed algorithms will be implemented as user-friendly open source software tools.
This project addresses the proteoform identification problem by top-down data independent acquisition mass spectrometry. We will propose new machine learning models and new algorithms for high-throughput proteome-wide identification of complex proteoforms with post-translational modifications and sequence variations by using top- down data independent acquisition mass spectrometry. The proposed methods will facilitate the study of the function of complex proteoforms and the discovery of proteome biomarkers.
Showing the most recent 10 out of 15 publications