Clinical Proteomic Technologies for Cancer initiative of the National Cancer Institute (NCI-CPTC) concluded that mass spectrometry (MS) is one of the main platforms for monitoring biological fluids, cells and tissues. Although, there has been considerable progress in acquiring accurate mass spectrometers, there is still a need for development of advanced quantitative algorithms and software to analyze mass spectrometry data leading to reproducible results. Additionally, although it is essential to identify individual onco-proteins, it is also pertinent t elucidate the interplay of these proteins. Experimental initiatives to understand the protein-protein interaction network (PIN), although valuable, did not produce consistent and reproducible results in the past. Hence, there continues to be a need to predict the PIN computationally. There are numerous computational prediction procedures for Yeast Two Hybrid (Y2H) data and Affinity Purification Mass Spectrometry (AP-MS) data. However, these experiments are expensive. Here, we propose a completely different approach of building a large scale protein co-occurrence interaction network (PCN) from a basic fragmented peptide MS/MS data. The logic behind this PCN is that if two proteins co-occur in a sample with high probability then their chance of interaction is higher. Hence this could provide computational guidelines of conducting expensive bait-prey experiments like Y2H and AP-MS for experimental validation of a protein-protein interaction. In addition, we compare the differential nature of the PCN for the diseased and the normal samples to determine the most important protein groups that are potential biomarkers or drug targets for cancer treatments. Our goals in this proposal are three-fold: 1) Propose a novel hierarchical Bayesian statistical methodology to identify proteins from MS/MS spectra obtained from fluids and tissue samples of cervical and breast cancer patients. 2) Use an expanded hierarchical Bayesian model to construct the PCN of the cancer proteome;additionally, unravel the key topological features of PCNs such as hubs, modules, sub-networks and bottlenecks. 3) Use statistical inference to differentiate the overall structure of the PCNs for the key features of the PCNs between the diseased and control samples. We anticipate that completeness of the project will enhance our understanding of cervical and breast cancer and aid in developing novel cancer drugs. Additionally, it will train doctoral students in the field of computational biomedical research of the twenty-first century.

Public Health Relevance

Our proposed research will help identifying sensitive and specific proteomic biomarkers for complex diseases like cancer. Identification of the interrelationship between different proteins and peptides responsible for the disease may eventually result into clinical interventions custom made for every individual patient. This proposal will also contribute towards building trained workforce for twenty first century biomedical research.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Academic Research Enhancement Awards (AREA) (R15)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-HDM-R (90))
Program Officer
Li, Jerry
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Louisville
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Wan, Yubing; Datta, Susmita; Lee, J Jack et al. (2016) Monotonic single-index models to assess drug interactions. Stat Med :
Sikdar, Sinjini; Gill, Ryan; Datta, Susmita (2016) Improving protein identification from tandem mass spectrometry data by one-step methods and integrating data from other platforms. Brief Bioinform 17:262-9
Wan, Y; Datta, S; Conklin, D J et al. (2015) Variable selection models based on multiple imputation with an application for predicting median effective dose and maximum effect. J Stat Comput Simul 85:1902-1916
Kong, Maiying; Xu, Sheng; Levy, Steven M et al. (2015) GEE type inference for clustered zero-inflated negative binomial regression with application to dental caries. Comput Stat Data Anal 85:54-66
Kujala, Maiju; Nevalainen, Jaakko; März, Winfried et al. (2015) Differential network analysis with multiply imputed lipidomic data. PLoS One 10:e0121449
Gill, Ryan; Datta, Somnath; Datta, Susmita (2014) Differential network analysis in human cancer research. Curr Pharm Des 20:4-10
Chakraborty, Sutirtha; Datta, Somnath; Datta, Susmita (2013) svapls: an R package to correct for hidden factors of variability in gene expression studies. BMC Bioinformatics 14:236