Clinical Proteomic Technologies for Cancer initiative of the National Cancer Institute (NCI-CPTC) concluded that mass spectrometry (MS) is one of the main platforms for monitoring biological fluids, cells and tissues. Although, there has been considerable progress in acquiring accurate mass spectrometers, there is still a need for development of advanced quantitative algorithms and software to analyze mass spectrometry data leading to reproducible results. Additionally, although it is essential to identify individual onco-proteins, it is also pertinent t elucidate the interplay of these proteins. Experimental initiatives to understand the protein-protein interaction network (PIN), although valuable, did not produce consistent and reproducible results in the past. Hence, there continues to be a need to predict the PIN computationally. There are numerous computational prediction procedures for Yeast Two Hybrid (Y2H) data and Affinity Purification Mass Spectrometry (AP-MS) data. However, these experiments are expensive. Here, we propose a completely different approach of building a large scale protein co-occurrence interaction network (PCN) from a basic fragmented peptide MS/MS data. The logic behind this PCN is that if two proteins co-occur in a sample with high probability then their chance of interaction is higher. Hence this could provide computational guidelines of conducting expensive bait-prey experiments like Y2H and AP-MS for experimental validation of a protein-protein interaction. In addition, we compare the differential nature of the PCN for the diseased and the normal samples to determine the most important protein groups that are potential biomarkers or drug targets for cancer treatments. Our goals in this proposal are three-fold: 1) Propose a novel hierarchical Bayesian statistical methodology to identify proteins from MS/MS spectra obtained from fluids and tissue samples of cervical and breast cancer patients. 2) Use an expanded hierarchical Bayesian model to construct the PCN of the cancer proteome;additionally, unravel the key topological features of PCNs such as hubs, modules, sub-networks and bottlenecks. 3) Use statistical inference to differentiate the overall structure of the PCNs for the key features of the PCNs between the diseased and control samples. We anticipate that completeness of the project will enhance our understanding of cervical and breast cancer and aid in developing novel cancer drugs. Additionally, it will train doctoral students in the field of computational biomedical research of the twenty-first century.

Public Health Relevance

Our proposed research will help identifying sensitive and specific proteomic biomarkers for complex diseases like cancer. Identification of the interrelationship between different proteins and peptides responsible for the disease may eventually result into clinical interventions custom made for every individual patient. This proposal will also contribute towards building trained workforce for twenty first century biomedical research.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Academic Research Enhancement Awards (AREA) (R15)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-HDM-R (90))
Program Officer
Li, Jerry
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Louisville
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Wu, You; Gaskins, Jeremy; Kong, Maiying et al. (2018) Profiling the effects of short time-course cold ischemia on tumor protein phosphorylation using a Bayesian approach. Biometrics 74:331-341
Walker, Alejandro R; Grimes, Tyler L; Datta, Somnath et al. (2018) Unraveling bacterial fingerprints of city subways from microbiome 16S gene profiles. Biol Direct 13:10
Pesonen, Maiju; Nevalainen, Jaakko; Potter, Steven et al. (2018) A Combined PLS and Negative Binomial Regression Model for Inferring Association Networks from Next-Generation Sequencing Count Data. IEEE/ACM Trans Comput Biol Bioinform 15:760-773
Sekula, Michael; Datta, Somnath; Datta, Susmita (2017) optCluster: An R Package for Determining the Optimal Clustering Algorithm. Bioinformation 13:101-103
Wan, Yubing; Datta, Susmita; Lee, J Jack et al. (2017) Monotonic single-index models to assess drug interactions. Stat Med 36:655-670
Dutta, Sandipan; Datta, Susmita; Datta, Somnath (2017) Temporal Prediction of Future State Occupation in a Multistate Model from High-Dimensional Baseline Covariates via Pseudo-Value Regression. J Stat Comput Simul 87:1363-1378
Sikdar, Sinjini; Datta, Susmita (2017) A novel statistical approach for identification of the master regulator transcription factor. BMC Bioinformatics 18:79
Sikdar, Sinjini; Datta, Somnath; Datta, Susmita (2016) Exploring the importance of cancer pathways by meta-analysis of differential protein expression networks in three different cancers. Biol Direct 11:65
Sikdar, Sinjini; Gill, Ryan; Datta, Susmita (2016) Improving protein identification from tandem mass spectrometry data by one-step methods and integrating data from other platforms. Brief Bioinform 17:262-9
Siriwardhana, Chathura; Datta, Susmita; Datta, Somnath (2016) Inter-platform concordance of gene expression data for the prediction of chemical mode of action. Biol Direct 11:67

Showing the most recent 10 out of 20 publications