Shotgun proteomics is one of the most commonly used approaches to MS-based biomarker discovery, due to its high throughput and sensitivity. The general strategy involves simultaneous protease digestion of all proteins in a mixture, liquid chromatography-based separation of peptides and analysis by tandem mass spectrometry (MS/MS) to produce fragmentation spectra of each peptide. Each experimental spectrum is searched against a protein database. Sequences that best match the experimental spectra are considered identified, while a set of reliably identified peptides from the same protein is necessary for a reliable protein identification. The main goal in the proposed work is to generate and interrogate MS/MS data from several proteomics platforms, including ESI/MS, MALDI/TOF/TOF, LC-IMS/TOF and MALDI-PID/TOF to develop customized computational tools that address several challenging problems in shotgun proteomics data analysis: peptide identification, protein identification and label-free protein quantification. Our proposed approach is data-driven. At its core is the application of machine learning methods to the prediction of peptide fragmentation spectra as well as the likelihood of peptide detection in a typical proteomics experiment. Improved peptide identification coupled with the predicted peptide delectability will then be used to develop new methods for improved protein identification and quantification. The methods proposed herein will be extensively evaluated and software will be made public both as web-based tools and open-source deliverables. These software tools will enable researchers using proteomics technologies to more effectively and efficiently study a variety of health related conditions. Such studies might entail disease diagnosis (biomarker discovery), disease progression (tissue profiling), or effects of treatment (drug-induced proteome changes). These studies will enhance understanding of diseases and hasten the development of effective treatments and cures. In addition, these tools will be useful in characterizing new analytical tools for proteome analysis. Here we propose to develop and extensively evaluate computational methodology that will be used to improve the interpretation of tandem mass spectrometry data. These software tools will enable researchers using proteomics technologies to more effectively and efficiently study a variety of health related conditions. Such studies that might entail disease diagnosis, disease progression, or effects of treatment, will enhance understanding of diseases and hasten the development of effective treatments and cures.

Agency
National Institute of Health (NIH)
Institute
National Center for Research Resources (NCRR)
Type
Research Project (R01)
Project #
5R01RR024236-03
Application #
7916503
Study Section
Special Emphasis Panel (ZRG1-MSFD-N (01))
Program Officer
Sheeley, Douglas
Project Start
2008-09-15
Project End
2012-08-31
Budget Start
2010-09-01
Budget End
2012-08-31
Support Year
3
Fiscal Year
2010
Total Cost
$273,971
Indirect Cost
Name
Indiana University Bloomington
Department
Type
Other Domestic Higher Education
DUNS #
006046700
City
Bloomington
State
IN
Country
United States
Zip Code
47401
Xue, Liang; Wang, Pengcheng; Wang, Lianshui et al. (2013) Quantitative measurement of phosphoproteome response to osmotic stress in arabidopsis based on Library-Assisted eXtracted Ion Chromatogram (LAXIC). Mol Cell Proteomics 12:2354-69
Ji, Chao; Arnold, Randy J; Sokoloski, Kevin J et al. (2013) Extending the coverage of spectral libraries: a neighbor-based approach to predicting intensities of peptide fragmentation spectra. Proteomics 13:756-65
Li, Yong Fuga; Arnold, Randy J; Radivojac, Predrag et al. (2012) Protein identification problem from a Bayesian point of view. Stat Interface 5:21-37
Li, Yong Fuga; Radivojac, Predrag (2012) Computational approaches to protein inference in shotgun proteomics. BMC Bioinformatics 13 Suppl 16:S4
Lai, Xianyin; Wang, Lianshui; Tang, Haixu et al. (2011) A novel alignment method and multiple filters for exclusion of unqualified peptides to enhance label-free quantification using peptide intensity in LC-MS/MS. J Proteome Res 10:4799-812
Liu, Xiaohui; Li, Yong Fuji; Bohrer, Brian C et al. (2011) Investigation of VUV Photodissociation Propensities Using Peptide Libraries. Int J Mass Spectrom 308:142-154
Li, Sujun; Arnold, Randy J; Tang, Haixu et al. (2011) On the accuracy and limits of peptide fragmentation spectrum prediction. Anal Chem 83:790-6
Li, Yong Fuga; Arnold, Randy J; Tang, Haixu et al. (2010) The importance of peptide detectability for protein identification, quantification, and experiment design in MS/MS proteomics. J Proteome Res 9:6288-97
Bohrer, Brian C; Li, Yong Fuga; Reilly, James P et al. (2010) Combinatorial libraries of synthetic peptides as a model for shotgun proteomics. Anal Chem 82:6559-68
Li, Yong Fuga; Arnold, Randy J; Li, Yixue et al. (2009) A bayesian approach to protein inference problem in shotgun proteomics. J Comput Biol 16:1183-93