Mass spectrometry-based top-down proteomics has emerged as one of the most informative approaches in protein analysis because it provides the bird-eye view of all intact proteoforms generated from post-translational modifications and sequence variations. A major challenge in proteoform identification by database search is the combinatorial explosion of possible proteoforms resulting from combinations of sequence variations, post-translational modifications, and other molecular events, such as protein degradation. Here, we propose to a novel data model, called the mass graph, to efficiently represent a huge number of potential proteoforms, and design new mass graph-based alignment and filtering algorithms that precisely identify complex proteoforms at the proteome level. We will also develop a software pipeline that combines top-down mass spectrometry and RNA-Seq data to identify sample-specific proteoforms. The proposed research will be conducted by a group of researchers who have complementary expertise. All the proposed algorithms will be implemented as user-friendly open source software tools.

Public Health Relevance

This project addresses the proteoform identification problem by top-down mass spectrometry and by top-down mass spectrometry-based proteogenomics. New data models and algorithms will be proposed for high-throughput proteome-wide identification of complex proteoforms with post-translational modifications and sequence variations. Software tools developed based on these algorithms will facilitate the decoding of complex proteoforms like histone proteins and the discovery of proteome biomarkers.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM118470-02
Application #
9281842
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
2016-06-01
Project End
2020-05-31
Budget Start
2017-06-01
Budget End
2018-05-31
Support Year
2
Fiscal Year
2017
Total Cost
$262,866
Indirect Cost
$71,806
Name
Indiana University-Purdue University at Indianapolis
Department
Miscellaneous
Type
Schools of Arts and Sciences
DUNS #
603007902
City
Indianapolis
State
IN
Country
United States
Zip Code
46202
Shen, Xiaojing; Kou, Qiang; Guo, Ruiqiong et al. (2018) Native Proteomics in Discovery Mode Using Size-Exclusion Chromatography-Capillary Zone Electrophoresis-Tandem Mass Spectrometry. Anal Chem 90:10095-10099
Kou, Qiang; Wu, Si; Liu, Xiaowen (2018) Systematic Evaluation of Protein Sequence Filtering Algorithms for Proteoform Identification Using Top-Down Mass Spectrometry. Proteomics 18:
Li, Ziwei; He, Bo; Kou, Qiang et al. (2018) Evaluation of top-down mass spectral identification with homologous protein sequences. BMC Bioinformatics 19:494
McCool, Elijah N; Lubeckyj, Rachele A; Shen, Xiaojing et al. (2018) Deep Top-Down Proteomics Using Capillary Zone Electrophoresis-Tandem Mass Spectrometry: Identification of 5700 Proteoforms from the Escherichia coli Proteome. Anal Chem 90:5529-5533
McCool, Elijah N; Lubeckyj, Rachele; Shen, Xiaojing et al. (2018) Large-scale Top-down Proteomics Using Capillary Zone Electrophoresis Tandem Mass Spectrometry. J Vis Exp :
Fornelli, Luca; Ayoub, Daniel; Aizikov, Konstantin et al. (2017) Top-down analysis of immunoglobulin G isotypes 1 and 2 with electron transfer dissociation on a high-field Orbitrap mass spectrometer. J Proteomics 159:67-76
Ma, Hongyan; Delafield, Daniel G; Wang, Zhe et al. (2017) Finding Biomass Degrading Enzymes Through an Activity-Correlated Quantitative Proteomics Platform (ACPP). J Am Soc Mass Spectrom 28:655-663
Kou, Qiang; Wu, Si; Tolic, Nikola et al. (2017) A mass graph-based approach for the identification of modified proteoforms using top-down tandem mass spectra. Bioinformatics 33:1309-1316
Zhang, Xinjun; Li, Meng; Lin, Hai et al. (2017) regSNPs-splicing: a tool for prioritizing synonymous single-nucleotide substitution. Hum Genet 136:1279-1289
Yang, Runmin; Zhu, Daming; Kou, Qiang et al. (2017) A Spectrum Graph-Based Protein Sequence Filtering Algorithm for Proteoform Identification by Top-Down Mass Spectrometry. Proceedings (IEEE Int Conf Bioinformatics Biomed) 2017:222-229

Showing the most recent 10 out of 15 publications