Clinical Proteomic Technologies for Cancer initiative of the National Cancer Institute (NCI-CPTC) concluded that mass spectrometry (MS) is one of the main platforms for monitoring biological fluids, cells and tissues. Although, there has been considerable progress in acquiring accurate mass spectrometers, there is still a need for development of advanced quantitative algorithms and software to analyze mass spectrometry data leading to reproducible results. Additionally, although it is essential to identify individual onco-proteins, it is also pertinent t elucidate the interplay of these proteins. Experimental initiatives to understand the protein-protein interaction network (PIN), although valuable, did not produce consistent and reproducible results in the past. Hence, there continues to be a need to predict the PIN computationally. There are numerous computational prediction procedures for Yeast Two Hybrid (Y2H) data and Affinity Purification Mass Spectrometry (AP-MS) data. However, these experiments are expensive. Here, we propose a completely different approach of building a large scale protein co-occurrence interaction network (PCN) from a basic fragmented peptide MS/MS data. The logic behind this PCN is that if two proteins co-occur in a sample with high probability then their chance of interaction is higher. Hence this could provide computational guidelines of conducting expensive bait-prey experiments like Y2H and AP-MS for experimental validation of a protein-protein interaction. In addition, we compare the differential nature of the PCN for the diseased and the normal samples to determine the most important protein groups that are potential biomarkers or drug targets for cancer treatments. Our goals in this proposal are three-fold: 1) Propose a novel hierarchical Bayesian statistical methodology to identify proteins from MS/MS spectra obtained from fluids and tissue samples of cervical and breast cancer patients. 2) Use an expanded hierarchical Bayesian model to construct the PCN of the cancer proteome;additionally, unravel the key topological features of PCNs such as hubs, modules, sub-networks and bottlenecks. 3) Use statistical inference to differentiate the overall structure of the PCNs for the key features of the PCNs between the diseased and control samples. We anticipate that completeness of the project will enhance our understanding of cervical and breast cancer and aid in developing novel cancer drugs. Additionally, it will train doctoral students in the field of computational biomedical research of the twenty-first century.

Public Health Relevance

Our proposed research will help identifying sensitive and specific proteomic biomarkers for complex diseases like cancer. Identification of the interrelationship between different proteins and peptides responsible for the disease may eventually result into clinical interventions custom made for every individual patient. This proposal will also contribute towards building trained workforce for twenty first century biomedical research.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Academic Research Enhancement Awards (AREA) (R15)
Project #
1R15CA170091-01A1
Application #
8496581
Study Section
Special Emphasis Panel (ZRG1-HDM-R (90))
Program Officer
Li, Jerry
Project Start
2013-03-04
Project End
2016-02-28
Budget Start
2013-03-04
Budget End
2016-02-28
Support Year
1
Fiscal Year
2013
Total Cost
$463,508
Indirect Cost
$128,892
Name
University of Louisville
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
057588857
City
Louisville
State
KY
Country
United States
Zip Code
40292
Sekula, Michael; Datta, Somnath; Datta, Susmita (2017) optCluster: An R Package for Determining the Optimal Clustering Algorithm. Bioinformation 13:101-103
Wan, Yubing; Datta, Susmita; Lee, J Jack et al. (2017) Monotonic single-index models to assess drug interactions. Stat Med 36:655-670
Dutta, Sandipan; Datta, Susmita; Datta, Somnath (2017) Temporal Prediction of Future State Occupation in a Multistate Model from High-Dimensional Baseline Covariates via Pseudo-Value Regression. J Stat Comput Simul 87:1363-1378
Sikdar, Sinjini; Datta, Susmita (2017) A novel statistical approach for identification of the master regulator transcription factor. BMC Bioinformatics 18:79
Pesonen, Maiju; Nevalainen, Jaakko; Potter, Steven et al. (2017) A Combined PLS and Negative Binomial Regression Model for Inferring Association Networks from Next-generation Sequencing Count Data. IEEE/ACM Trans Comput Biol Bioinform :
Siriwardhana, Chathura; Datta, Susmita; Datta, Somnath (2016) Inter-platform concordance of gene expression data for the prediction of chemical mode of action. Biol Direct 11:67
Sikdar, Sinjini; Datta, Somnath; Datta, Susmita (2016) Exploring the importance of cancer pathways by meta-analysis of differential protein expression networks in three different cancers. Biol Direct 11:65
Sikdar, Sinjini; Gill, Ryan; Datta, Susmita (2016) Improving protein identification from tandem mass spectrometry data by one-step methods and integrating data from other platforms. Brief Bioinform 17:262-9
Kujala, Maiju; Nevalainen, Jaakko; März, Winfried et al. (2015) Differential network analysis with multiply imputed lipidomic data. PLoS One 10:e0121449
Wan, Y; Datta, S; Conklin, D J et al. (2015) Variable selection models based on multiple imputation with an application for predicting median effective dose and maximum effect. J Stat Comput Simul 85:1902-1916

Showing the most recent 10 out of 18 publications