Latent Dirichlet Allocation for Protein Inference in Quantitative Proteomics

Cohen, Aaron

Abstract

One way to accelerate the understanding of the molecular basis of cancer is through the application of robust, quantitative, proteomic technologies and corresponding computational methodologies. Mass spectroscopy measurement technology for peptides (LC-MS/MS) is rapidly advancing, and there is a great need for more development of the corresponding bioinformatics analysis techniques to infer proteins from the peptide spectra. The Latent Dirichlet Allocation (LDA) for Protein Inference in Quantitative Proteomics research project will adapt LDA, an established method of topic modeling from text mining, to the problem of protein inference. Advances in protein inference will be of great utility and interest in cancer clinical proteomics studies. Successfully deploying these methods will directly lead to an increase in the ability of proteomics to augment cancer research in many important areas such as biomarker discovery, pathogenesis, and patient-specific tumor therapies.
Two specific aims i n support of these goals will be undertaken during the proposed project: * Aim 1. Investigate how to best apply latent Dirichlet allocation modeling techniques previously used in text mining to the problem of protein inference. Areas to explore include the application of biological and domain knowledge constraints to the model as well as parameter optimization techniques. Tune and evaluate the approach in terms of accuracy, sensitivity, and specificity on a set of simulated protein-peptide fragment data with various amounts of noise and errors in the peptide reading process. Further evaluation and validation will be performed using LC-MS/MS data produced from proteomic laboratory standards that provide a known solution to complex real-world data samples. * Aim 2. Demonstrate the utility of the latent Dirichlet allocation-based protein inference techniques by application to experimental cancer data. A head and neck squamous cell carcinoma (SCC) study from the Vanderbilt-Ingram Cancer Center providing public data will be utilized allowing the comparison of results using LDA with those obtained by current standard techniques in terms of prediction overlap, differences, and confidence levels.

Public Health Relevance

This project will improve the ability of scientists and researchers to accurately and comprehensively analyze the protein content of biological specimens, including patient medical samples. These improvements will accelerate progress in personalized medicine for cancer, chronic diseases such as diabetes mellitus, and other disease areas by providing a new perspective and detailed view into biological processes necessary for health.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Exploratory/Developmental Grants (R21)
Project #: 5R21CA181382-02
Application #: 8771434
Study Section: Special Emphasis Panel (ZCA1)
Program Officer: Li, Jerry

Project Start: 2014-01-01
Project End: 2016-12-31
Budget Start: 2015-01-01
Budget End: 2016-12-31
Support Year: 2
Fiscal Year: 2015
Total Cost
Indirect Cost

Institution

Name: Oregon Health and Science University
Department: Biostatistics & Other Math Sci
Type: Schools of Medicine
DUNS #: 096997515

City: Portland
State: OR
Country: United States
Zip Code: 97239

Related projects


NIH 2015 R21 CA	Latent Dirichlet Allocation for Protein Inference in Quantitative Proteomics Cohen, Aaron M. / Oregon Health and Science University
NIH 2014 R21 CA	Latent Dirichlet Allocation for Protein Inference in Quantitative Proteomics Cohen, Aaron M. / Oregon Health and Science University	$200,970

Comments

Be the first to comment on Aaron Cohen's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: