High dimensional proteomics spectra generated by Mass spectrometry instruments potentially can identify more sensitive and specific biomarker for complex diseases like cancer. However, the lack of sophisticated methods of automated analysis of mass spectra makes it difficult to interpret the spectral data in an efficient manner. Although, there are some quantitative works in this area in recent years, very few have developed algorithms with the underlined knowledge of chemistry in mind. Overall goal of this project is to develop novel and improved statistical methods for analyzing high dimensional proteomic data. In particular, this proposal focuses on 1) separating the true isotopic peaks from chemical noise in a mass spectrum using statistical modeling and hypotheses tests, 2) comprehensive evaluation and aggregate ranking of a number of classification techniques to classify the case and control samples using proteomic profiles of the detected peaks and construction of an adaptive classifier which is expected to perform better than individual classifiers under an ensemble of performance measures and 3) construction of a protein- protein association network from the truly classifying peaks in a case-control study. We will incorporate the knowledge of the isotopic composition of a molecule of protein and fit a mixture of location shifted Poisson distributions to the isotopic pattern to the polypeptide spectrum. Our expectation is that our proposed methodology in three different aspects of biomarker identification study will result in better understanding of the underlined biological system and will translate into more sensitive and specific proteomic biomarker useful to treat cancer in future. Moreover, the proposed methodologies are general enough to be adapted to other high dimensional biological data generated from genomes, proteomes and metabolomes. The results from this study will expose the graduate students of the department of Bioinformatics and Biostatistics and many other students in the interdisciplinary PhD program at the university in the area of statistical proteomics.
Our proposed research will help identifying sensitive and specific proteomic biomarkers for complex diseases like cancer. This research will enhance the capacity of understanding the molecular basis of cancer in general. Moreover, identification of the interrelationship between different proteins and peptides responsible for the disease may eventually result into clinical interventions custom made for every individual patient.