The primary goal of this proposal is to collect high-resolution information on the distribution of proteins within mammalian cells and to link it to nucleotide and protein sequences. It builds on extensive prior work on development of protein tagging methods by the co-PIs and on development of software systems for automated analysis of subcellular patterns in fluorescence microscope images by the PI. 25,000 independent cell lines expressing GFP protein fusions will be created in NIH 3T3 cells using high-throughput CD-tagging (protein-trapping) methods. As the cell lines are created, high-resolution fluorescence microscope images will be collected using fluorescence microscopy and the gene and protein tagged in each cell line will be determined by high-throughput molecular analysis methods. The images will be subjected to automated, computerized image analysis to group proteins with statistically indistinguishable patterns. The determined location for each protein will be compared to whatever information is available from protein databases, journal articles and location predictors. Each assigned location will be accompanied by a confidence estimate derived from combining these sources. In addition, the images for each protein group will be used to build generative models that can synthesize new protein distributions statistically equivalent to the original images. The ability to synthesize distributions will provide an important structural framework for systems biology modeling of cell behavior in normal and disease states.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Deatherage, James F
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Carnegie-Mellon University
Schools of Arts and Sciences
United States
Zip Code
Murphy, Robert F (2016) Building cell models and simulations from microscope images. Methods 96:33-39
Naik, Armaghan W; Kangas, Joshua D; Sullivan, Devin P et al. (2016) Active machine learning-driven experimentation to determine compound effects on protein patterns. Elife 5:e10047
Johnson, Gregory R; Li, Jieyue; Shariff, Aabid et al. (2015) Automated Learning of Subcellular Variation among Punctate Protein Patterns and a Generative Model of Their Relation to Microtubules. PLoS Comput Biol 11:e1004614
Kumar, Aparna; Rao, Arvind; Bhavani, Santosh et al. (2014) Automated analysis of immunohistochemistry images identifies candidate location biomarkers for cancers. Proc Natl Acad Sci U S A 111:18249-54
Kangas, Joshua D; Naik, Armaghan W; Murphy, Robert F (2014) Efficient discovery of responses of proteins to compounds using active learning. BMC Bioinformatics 15:143
Naik, Armaghan W; Kangas, Joshua D; Langmead, Christopher J et al. (2013) Efficient modeling and active learning discovery of biological responses. PLoS One 8:e83996
Coelho, Luis Pedro; Kangas, Joshua D; Naik, Armaghan W et al. (2013) Determining the subcellular location of new proteins from microscope images using local features. Bioinformatics 29:2343-9
Li, Jieyue; Xiong, Liang; Schneider, Jeff et al. (2012) Protein subcellular location pattern classification in cellular images using latent discriminative models. Bioinformatics 28:i32-9
Buck, Taráz E; Li, Jieyue; Rohde, Gustavo K et al. (2012) Toward the virtual cell: automated approaches to building models of subcellular organization ""learned"" from microscopy images. Bioessays 34:791-9
Li, Jieyue; Shariff, Aabid; Wiking, Mikaela et al. (2012) Estimating microtubule distributions from 2D immunofluorescence microscopy images reveals differences among human cultured cell lines. PLoS One 7:e50292

Showing the most recent 10 out of 31 publications