Mass spectrometry-based top-down proteomics has emerged as one of the most informative approaches in protein analysis because it provides the bird-eye view of all intact proteoforms generated from post-translational modifications and sequence variations. A major challenge in proteoform identification by database search is the combinatorial explosion of possible proteoforms resulting from combinations of sequence variations, post-translational modifications, and other molecular events, such as protein degradation. Here, we propose to a novel data model, called the mass graph, to efficiently represent a huge number of potential proteoforms, and design new mass graph-based alignment and filtering algorithms that precisely identify complex proteoforms at the proteome level. We will also develop a software pipeline that combines top-down mass spectrometry and RNA-Seq data to identify sample-specific proteoforms. The proposed research will be conducted by a group of researchers who have complementary expertise. All the proposed algorithms will be implemented as user-friendly open source software tools.

Public Health Relevance

This project addresses the proteoform identification problem by top-down mass spectrometry and by top-down mass spectrometry-based proteogenomics. New data models and algorithms will be proposed for high-throughput proteome-wide identification of complex proteoforms with post-translational modifications and sequence variations. Software tools developed based on these algorithms will facilitate the decoding of complex proteoforms like histone proteins and the discovery of proteome biomarkers.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Indiana University-Purdue University at Indianapolis
Schools of Arts and Sciences
United States
Zip Code
Li, Ziwei; He, Bo; Kou, Qiang et al. (2018) Evaluation of top-down mass spectral identification with homologous protein sequences. BMC Bioinformatics 19:494
McCool, Elijah N; Lubeckyj, Rachele A; Shen, Xiaojing et al. (2018) Deep Top-Down Proteomics Using Capillary Zone Electrophoresis-Tandem Mass Spectrometry: Identification of 5700 Proteoforms from the Escherichia coli Proteome. Anal Chem 90:5529-5533
McCool, Elijah N; Lubeckyj, Rachele; Shen, Xiaojing et al. (2018) Large-scale Top-down Proteomics Using Capillary Zone Electrophoresis Tandem Mass Spectrometry. J Vis Exp :
Shen, Xiaojing; Kou, Qiang; Guo, Ruiqiong et al. (2018) Native Proteomics in Discovery Mode Using Size-Exclusion Chromatography-Capillary Zone Electrophoresis-Tandem Mass Spectrometry. Anal Chem 90:10095-10099
Kou, Qiang; Wu, Si; Liu, Xiaowen (2018) Systematic Evaluation of Protein Sequence Filtering Algorithms for Proteoform Identification Using Top-Down Mass Spectrometry. Proteomics 18:
Yang, Runmin; Zhu, Daming; Kou, Qiang et al. (2017) A Spectrum Graph-Based Protein Sequence Filtering Algorithm for Proteoform Identification by Top-Down Mass Spectrometry. Proceedings (IEEE Int Conf Bioinformatics Biomed) 2017:222-229
Lubeckyj, Rachele A; McCool, Elijah N; Shen, Xiaojing et al. (2017) Single-Shot Top-Down Proteomics with Capillary Zone Electrophoresis-Electrospray Ionization-Tandem Mass Spectrometry for Identification of Nearly 600 Escherichia coli Proteoforms. Anal Chem 89:12059-12067
Qingge, Letu; Liu, Xiaowen; Zhong, Farong et al. (2017) Filling a Protein Scaffold With a Reference. IEEE Trans Nanobioscience 16:123-130
Fornelli, Luca; Ayoub, Daniel; Aizikov, Konstantin et al. (2017) Top-down analysis of immunoglobulin G isotypes 1 and 2 with electron transfer dissociation on a high-field Orbitrap mass spectrometer. J Proteomics 159:67-76
Ma, Hongyan; Delafield, Daniel G; Wang, Zhe et al. (2017) Finding Biomass Degrading Enzymes Through an Activity-Correlated Quantitative Proteomics Platform (ACPP). J Am Soc Mass Spectrom 28:655-663

Showing the most recent 10 out of 15 publications