Identification of Proteins from Mass Spectrometry Data: A Statistical Approach

Datta, Susmita

Abstract

Clinical Proteomic Technologies for Cancer initiative of the National Cancer Institute (NCI-CPTC) concluded that mass spectrometry (MS) is one of the main platforms for monitoring biological fluids, cells and tissues. Although, there has been considerable progress in acquiring accurate mass spectrometers, there is still a need for development of advanced quantitative algorithms and software to analyze mass spectrometry data leading to reproducible results. Additionally, although it is essential to identify individual onco-proteins, it is also pertinent t elucidate the interplay of these proteins. Experimental initiatives to understand the protein-protein interaction network (PIN), although valuable, did not produce consistent and reproducible results in the past. Hence, there continues to be a need to predict the PIN computationally. There are numerous computational prediction procedures for Yeast Two Hybrid (Y2H) data and Affinity Purification Mass Spectrometry (AP-MS) data. However, these experiments are expensive. Here, we propose a completely different approach of building a large scale protein co-occurrence interaction network (PCN) from a basic fragmented peptide MS/MS data. The logic behind this PCN is that if two proteins co-occur in a sample with high probability then their chance of interaction is higher. Hence this could provide computational guidelines of conducting expensive bait-prey experiments like Y2H and AP-MS for experimental validation of a protein-protein interaction. In addition, we compare the differential nature of the PCN for the diseased and the normal samples to determine the most important protein groups that are potential biomarkers or drug targets for cancer treatments. Our goals in this proposal are three-fold: 1) Propose a novel hierarchical Bayesian statistical methodology to identify proteins from MS/MS spectra obtained from fluids and tissue samples of cervical and breast cancer patients. 2) Use an expanded hierarchical Bayesian model to construct the PCN of the cancer proteome;additionally, unravel the key topological features of PCNs such as hubs, modules, sub-networks and bottlenecks. 3) Use statistical inference to differentiate the overall structure of the PCNs for the key features of the PCNs between the diseased and control samples. We anticipate that completeness of the project will enhance our understanding of cervical and breast cancer and aid in developing novel cancer drugs. Additionally, it will train doctoral students in the field of computational biomedical research of the twenty-first century.

Public Health Relevance

Our proposed research will help identifying sensitive and specific proteomic biomarkers for complex diseases like cancer. Identification of the interrelationship between different proteins and peptides responsible for the disease may eventually result into clinical interventions custom made for every individual patient. This proposal will also contribute towards building trained workforce for twenty first century biomedical research.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Academic Research Enhancement Awards (AREA) (R15)
Project #: 1R15CA170091-01A1
Application #: 8496581
Study Section: Special Emphasis Panel (ZRG1-HDM-R (90))
Program Officer: Li, Jerry

Project Start: 2013-03-04
Project End: 2016-02-28
Budget Start: 2013-03-04
Budget End: 2016-02-28
Support Year: 1
Fiscal Year: 2013
Total Cost: $463,508
Indirect Cost: $128,892

Institution

Name: University of Louisville
Department: Biostatistics & Other Math Sci
Type: Schools of Public Health
DUNS #: 057588857

City: Louisville
State: KY
Country: United States
Zip Code: 40292

Publications

Wu, You; Gaskins, Jeremy; Kong, Maiying et al. (2018) Profiling the effects of short time-course cold ischemia on tumor protein phosphorylation using a Bayesian approach. Biometrics 74:331-341

Walker, Alejandro R; Grimes, Tyler L; Datta, Somnath et al. (2018) Unraveling bacterial fingerprints of city subways from microbiome 16S gene profiles. Biol Direct 13:10

Pesonen, Maiju; Nevalainen, Jaakko; Potter, Steven et al. (2018) A Combined PLS and Negative Binomial Regression Model for Inferring Association Networks from Next-Generation Sequencing Count Data. IEEE/ACM Trans Comput Biol Bioinform 15:760-773

Sekula, Michael; Datta, Somnath; Datta, Susmita (2017) optCluster: An R Package for Determining the Optimal Clustering Algorithm. Bioinformation 13:101-103

Wan, Yubing; Datta, Susmita; Lee, J Jack et al. (2017) Monotonic single-index models to assess drug interactions. Stat Med 36:655-670

Dutta, Sandipan; Datta, Susmita; Datta, Somnath (2017) Temporal Prediction of Future State Occupation in a Multistate Model from High-Dimensional Baseline Covariates via Pseudo-Value Regression. J Stat Comput Simul 87:1363-1378

Sikdar, Sinjini; Datta, Susmita (2017) A novel statistical approach for identification of the master regulator transcription factor. BMC Bioinformatics 18:79

Sikdar, Sinjini; Datta, Somnath; Datta, Susmita (2016) Exploring the importance of cancer pathways by meta-analysis of differential protein expression networks in three different cancers. Biol Direct 11:65

Sikdar, Sinjini; Gill, Ryan; Datta, Susmita (2016) Improving protein identification from tandem mass spectrometry data by one-step methods and integrating data from other platforms. Brief Bioinform 17:262-9

Siriwardhana, Chathura; Datta, Susmita; Datta, Somnath (2016) Inter-platform concordance of gene expression data for the prediction of chemical mode of action. Biol Direct 11:67

Showing the most recent 10 out of 20 publications

Comments

Be the first to comment on Susmita Datta's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: