Principal Investigator/Program Director (Last, first, middle): Murphy, Robert F. Project Summary/Abstract This proposal for a competitive revision is being submitted in response to Notice Number NOT-OD-09-058 with Notice Title """"""""NIH Announces the Availability of Recovery Act Funds for Competitive Revision Applications."""""""" The goal of the current R01, GM052705, is the determination via automated fluorescence microscopy and machine learning of the subcellular location of thousands of proteins in NIH 3T3 cells. We have created an extensive database of images and analysis results and continue to add new proteins to the database at the rate of approximately 100 per week. While the current project addresses a significant need both for understanding protein function and for creating predictive models of cell behaviors, the proposed revision is to address an important related problem: learning how protein locations change under a very large number of conditions. Given that there are at least tens of thousands of conditions that could cause changes (i.e., mutations in any of tens of thousands of genes or the presence of thousands of drugs), and that these changes could occur over time frames varying over orders of magnitude, the scope of the problem is enormous. It is also a critical problem to address given the number of cases already known in which alterations in subcellular location have been shown to cause or be associated with diseases. If all combinations of proteins, conditions and time frames are truly (or largely) independent and have to be measured in order to find out whether they result in changes, it is unlikely that this could ever be accomplished. However, we can hope and expect that there are correlations between these combinations that would permit us to be able to predict the responses of particular proteins under particular conditions without having to measure them directly. Demonstrating a way to do this in a concrete case building on work in the current grant is the goal of this competitive revision. The three key components are a modeling approach that can efficiently learn the correlations between behaviors, a machine learning strategy (termed active learning) that iteratively chooses experiments to perform based on the current model, and automation to execute and interpret the experiments. We will use these components to build a model of the effect of approximately one hundred compounds on approximately one hundred cell lines expressing different GFP-tagged proteins. While this task could be achieved by brute force, we will determine the extent to which an accurate model can be created without performing all tests. We will then extend the model to all pairwise combinations of compounds, a task that cannot reasonably be performed by brute force. We anticipate that successful completion of the project will have a major impact on the way in which both biomarkers and pharmaceuticals are identified and developed, including a potentially enormous increase in efficiency of work being done through the extensive NIH-supported Molecular Libraries Screening Centers Network. The project will take advantage of the cell lines and methods being created under the existing grant to enable it to be done far more inexpensively than if initiated as a standalone project, and will also provide employment for U.S. citizens consistent with the goals of the American Recovery and Reinvestment Act.

Public Health Relevance

Current approaches for measuring the effects of drugs, especially when used in combination, are not able to address the large number of potential targets that these drugs may have. The proposed work will use a sophisticated probabilistic model and an active learning approach to demonstrate how such effects can be learned without measuring all possible combinations of drugs and targets. The work has the potential to dramatically change the way cell-based assays are used in drug discovery.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
3R01GM075205-03S1
Application #
7813483
Study Section
Special Emphasis Panel (ZRG1-BST-N (96))
Program Officer
Deatherage, James F
Project Start
2009-09-30
Project End
2011-05-31
Budget Start
2009-09-30
Budget End
2011-05-31
Support Year
3
Fiscal Year
2009
Total Cost
$510,300
Indirect Cost
Name
Carnegie-Mellon University
Department
Biology
Type
Schools of Arts and Sciences
DUNS #
052184116
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213
Murphy, Robert F (2016) Building cell models and simulations from microscope images. Methods 96:33-39
Naik, Armaghan W; Kangas, Joshua D; Sullivan, Devin P et al. (2016) Active machine learning-driven experimentation to determine compound effects on protein patterns. Elife 5:e10047
Johnson, Gregory R; Li, Jieyue; Shariff, Aabid et al. (2015) Automated Learning of Subcellular Variation among Punctate Protein Patterns and a Generative Model of Their Relation to Microtubules. PLoS Comput Biol 11:e1004614
Kumar, Aparna; Rao, Arvind; Bhavani, Santosh et al. (2014) Automated analysis of immunohistochemistry images identifies candidate location biomarkers for cancers. Proc Natl Acad Sci U S A 111:18249-54
Kangas, Joshua D; Naik, Armaghan W; Murphy, Robert F (2014) Efficient discovery of responses of proteins to compounds using active learning. BMC Bioinformatics 15:143
Naik, Armaghan W; Kangas, Joshua D; Langmead, Christopher J et al. (2013) Efficient modeling and active learning discovery of biological responses. PLoS One 8:e83996
Coelho, Luis Pedro; Kangas, Joshua D; Naik, Armaghan W et al. (2013) Determining the subcellular location of new proteins from microscope images using local features. Bioinformatics 29:2343-9
Li, Jieyue; Xiong, Liang; Schneider, Jeff et al. (2012) Protein subcellular location pattern classification in cellular images using latent discriminative models. Bioinformatics 28:i32-9
Buck, Taráz E; Li, Jieyue; Rohde, Gustavo K et al. (2012) Toward the virtual cell: automated approaches to building models of subcellular organization ""learned"" from microscopy images. Bioessays 34:791-9
Li, Jieyue; Shariff, Aabid; Wiking, Mikaela et al. (2012) Estimating microtubule distributions from 2D immunofluorescence microscopy images reveals differences among human cultured cell lines. PLoS One 7:e50292

Showing the most recent 10 out of 31 publications