High dimensional proteomics spectra generated by Mass spectrometry instruments potentially can identify more sensitive and specific biomarker for complex diseases like cancer. However, the lack of sophisticated methods of automated analysis of mass spectra makes it difficult to interpret the spectral data in an efficient manner. Although, there are some quantitative works in this area in recent years, very few have developed algorithms with the underlined knowledge of chemistry in mind. Overall goal of this project is to develop novel and improved statistical methods for analyzing high dimensional proteomic data. In particular, this proposal focuses on 1) separating the true isotopic peaks from chemical noise in a mass spectrum using statistical modeling and hypotheses tests, 2) comprehensive evaluation and aggregate ranking of a number of classification techniques to classify the case and control samples using proteomic profiles of the detected peaks and construction of an adaptive classifier which is expected to perform better than individual classifiers under an ensemble of performance measures and 3) construction of a protein- protein association network from the truly classifying peaks in a case-control study. We will incorporate the knowledge of the isotopic composition of a molecule of protein and fit a mixture of location shifted Poisson distributions to the isotopic pattern to the polypeptide spectrum. Our expectation is that our proposed methodology in three different aspects of biomarker identification study will result in better understanding of the underlined biological system and will translate into more sensitive and specific proteomic biomarker useful to treat cancer in future. Moreover, the proposed methodologies are general enough to be adapted to other high dimensional biological data generated from genomes, proteomes and metabolomes. The results from this study will expose the graduate students of the department of Bioinformatics and Biostatistics and many other students in the interdisciplinary PhD program at the university in the area of statistical proteomics.

Public Health Relevance

Our proposed research will help identifying sensitive and specific proteomic biomarkers for complex diseases like cancer. This research will enhance the capacity of understanding the molecular basis of cancer in general. Moreover, identification of the interrelationship between different proteins and peptides responsible for the disease may eventually result into clinical interventions custom made for every individual patient.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Academic Research Enhancement Awards (AREA) (R15)
Project #
1R15CA133844-01A2
Application #
7714790
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Li, Jerry
Project Start
2009-07-01
Project End
2013-06-30
Budget Start
2009-07-01
Budget End
2013-06-30
Support Year
1
Fiscal Year
2009
Total Cost
$222,000
Indirect Cost
Name
University of Louisville
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
057588857
City
Louisville
State
KY
Country
United States
Zip Code
40292
Wu, You; Gaskins, Jeremy; Kong, Maiying et al. (2018) Profiling the effects of short time-course cold ischemia on tumor protein phosphorylation using a Bayesian approach. Biometrics 74:331-341
Gill, Ryan; Datta, Somnath; Datta, Susmita (2014) Differential network analysis in human cancer research. Curr Pharm Des 20:4-10
Datta, Susmita (2013) Feature selection and machine learning with mass spectrometry data. Methods Mol Biol 1007:237-62
Chakraborty, Sutirtha; Datta, Somnath; Datta, Susmita (2013) svapls: an R package to correct for hidden factors of variability in gene expression studies. BMC Bioinformatics 14:236
Ndukum, Juliet; Fonseca, Luis L; Santos, Helena et al. (2011) Statistical inference methods for sparse biological time series data. BMC Syst Biol 5:57
Li, Xiaohong; Gill, Ryan; Cooper, Nigel G F et al. (2011) Modeling microRNA-mRNA interactions using PLS regression in human colon cancer. BMC Med Genomics 4:44
Datta, Susmita; Datta, Somnath; Kim, Seongho et al. (2010) Statistical Analyses of Next Generation Sequence Data: A Partial Overview. J Proteomics Bioinform 3:183-190
Gill, Ryan; Datta, Somnath; Datta, Susmita (2010) A statistical framework for differential network analysis from microarray data. BMC Bioinformatics 11:95
Datta, Susmita; Pihur, Vasyl; Datta, Somnath (2010) An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data. BMC Bioinformatics 11:427