Building and Validating Location Proteomics Databases

Murphy, Robert

Abstract

Principal Investigator/Program Director (Last, first, middle): Murphy, Robert F. Project Summary/Abstract This proposal for a competitive revision is being submitted in response to Notice Number NOT-OD-09-058 with Notice Title """"""""NIH Announces the Availability of Recovery Act Funds for Competitive Revision Applications."""""""" The goal of the current R01, GM052705, is the determination via automated fluorescence microscopy and machine learning of the subcellular location of thousands of proteins in NIH 3T3 cells. We have created an extensive database of images and analysis results and continue to add new proteins to the database at the rate of approximately 100 per week. While the current project addresses a significant need both for understanding protein function and for creating predictive models of cell behaviors, the proposed revision is to address an important related problem: learning how protein locations change under a very large number of conditions. Given that there are at least tens of thousands of conditions that could cause changes (i.e., mutations in any of tens of thousands of genes or the presence of thousands of drugs), and that these changes could occur over time frames varying over orders of magnitude, the scope of the problem is enormous. It is also a critical problem to address given the number of cases already known in which alterations in subcellular location have been shown to cause or be associated with diseases. If all combinations of proteins, conditions and time frames are truly (or largely) independent and have to be measured in order to find out whether they result in changes, it is unlikely that this could ever be accomplished. However, we can hope and expect that there are correlations between these combinations that would permit us to be able to predict the responses of particular proteins under particular conditions without having to measure them directly. Demonstrating a way to do this in a concrete case building on work in the current grant is the goal of this competitive revision. The three key components are a modeling approach that can efficiently learn the correlations between behaviors, a machine learning strategy (termed active learning) that iteratively chooses experiments to perform based on the current model, and automation to execute and interpret the experiments. We will use these components to build a model of the effect of approximately one hundred compounds on approximately one hundred cell lines expressing different GFP-tagged proteins. While this task could be achieved by brute force, we will determine the extent to which an accurate model can be created without performing all tests. We will then extend the model to all pairwise combinations of compounds, a task that cannot reasonably be performed by brute force. We anticipate that successful completion of the project will have a major impact on the way in which both biomarkers and pharmaceuticals are identified and developed, including a potentially enormous increase in efficiency of work being done through the extensive NIH-supported Molecular Libraries Screening Centers Network. The project will take advantage of the cell lines and methods being created under the existing grant to enable it to be done far more inexpensively than if initiated as a standalone project, and will also provide employment for U.S. citizens consistent with the goals of the American Recovery and Reinvestment Act.

Public Health Relevance

Current approaches for measuring the effects of drugs, especially when used in combination, are not able to address the large number of potential targets that these drugs may have. The proposed work will use a sophisticated probabilistic model and an active learning approach to demonstrate how such effects can be learned without measuring all possible combinations of drugs and targets. The work has the potential to dramatically change the way cell-based assays are used in drug discovery.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 3R01GM075205-03S1
Application #: 7813483
Study Section: Special Emphasis Panel (ZRG1-BST-N (96))
Program Officer: Deatherage, James F

Project Start: 2009-09-30
Project End: 2011-05-31
Budget Start: 2009-09-30
Budget End: 2011-05-31
Support Year: 3
Fiscal Year: 2009
Total Cost: $510,300
Indirect Cost

Institution

Name: Carnegie-Mellon University
Department: Biology
Type: Schools of Arts and Sciences
DUNS #: 052184116

City: Pittsburgh
State: PA
Country: United States
Zip Code: 15213

Related projects


NIH 2011 R01 GM	Building and Validating Location Proteomics Databases Murphy, Robert F. / Carnegie-Mellon University	$156,700
NIH 2010 R01 GM	Building and Validating Location Proteomics Databases Murphy, Robert F. / Carnegie-Mellon University	$257,276
NIH 2010 R01 GM	Building and Validating Location Proteomics Databases Murphy, Robert F. / Carnegie-Mellon University	$123,328
NIH 2009 R01 GM	Building and Validating Location Proteomics Databases Murphy, Robert F. / Carnegie-Mellon University	$259,875
NIH 2009 R01 GM	Building and Validating Location Proteomics Databases Murphy, Robert F. / Carnegie-Mellon University	$510,300
NIH 2008 R01 GM	Building and Validating Location Proteomics Databases Murphy, Robert F. / Carnegie-Mellon University	$259,875
NIH 2007 R01 GM	Building and Validating Location Proteomics Databases Murphy, Robert F. / Carnegie-Mellon University	$258,178

Publications

Murphy, Robert F (2016) Building cell models and simulations from microscope images. Methods 96:33-39

Naik, Armaghan W; Kangas, Joshua D; Sullivan, Devin P et al. (2016) Active machine learning-driven experimentation to determine compound effects on protein patterns. Elife 5:e10047

Johnson, Gregory R; Li, Jieyue; Shariff, Aabid et al. (2015) Automated Learning of Subcellular Variation among Punctate Protein Patterns and a Generative Model of Their Relation to Microtubules. PLoS Comput Biol 11:e1004614

Kumar, Aparna; Rao, Arvind; Bhavani, Santosh et al. (2014) Automated analysis of immunohistochemistry images identifies candidate location biomarkers for cancers. Proc Natl Acad Sci U S A 111:18249-54

Kangas, Joshua D; Naik, Armaghan W; Murphy, Robert F (2014) Efficient discovery of responses of proteins to compounds using active learning. BMC Bioinformatics 15:143

Naik, Armaghan W; Kangas, Joshua D; Langmead, Christopher J et al. (2013) Efficient modeling and active learning discovery of biological responses. PLoS One 8:e83996

Coelho, Luis Pedro; Kangas, Joshua D; Naik, Armaghan W et al. (2013) Determining the subcellular location of new proteins from microscope images using local features. Bioinformatics 29:2343-9

Li, Jieyue; Xiong, Liang; Schneider, Jeff et al. (2012) Protein subcellular location pattern classification in cellular images using latent discriminative models. Bioinformatics 28:i32-9

Buck, Taráz E; Li, Jieyue; Rohde, Gustavo K et al. (2012) Toward the virtual cell: automated approaches to building models of subcellular organization ""learned"" from microscopy images. Bioessays 34:791-9

Li, Jieyue; Shariff, Aabid; Wiking, Mikaela et al. (2012) Estimating microtubule distributions from 2D immunofluorescence microscopy images reveals differences among human cultured cell lines. PLoS One 7:e50292

Showing the most recent 10 out of 31 publications

Comments

Be the first to comment on Robert Murphy's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: