The primary goal of this proposal is to collect high-resolution information on the distribution of proteins within mammalian cells and to link it to nucleotide and protein sequences. It builds on extensive prior work on development of protein tagging methods by the co-PIs and on development of software systems for automated analysis of subcellular patterns in fluorescence microscope images by the PI. 25,000 independent cell lines expressing GFP protein fusions will be created in NIH 3T3 cells using high-throughput CD-tagging (protein-trapping) methods. As the cell lines are created, high-resolution fluorescence microscope images will be collected using fluorescence microscopy and the gene and protein tagged in each cell line will be determined by high-throughput molecular analysis methods. The images will be subjected to automated, computerized image analysis to group proteins with statistically indistinguishable patterns. The determined location for each protein will be compared to whatever information is available from protein databases, journal articles and location predictors. Each assigned location will be accompanied by a confidence estimate derived from combining these sources. In addition, the images for each protein group will be used to build generative models that can synthesize new protein distributions statistically equivalent to the original images. The ability to synthesize distributions will provide an important structural framework for systems biology modeling of cell behavior in normal and disease states.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
3R01GM075205-03S2
Application #
8000191
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Deatherage, James F
Project Start
2010-01-14
Project End
2010-12-31
Budget Start
2010-01-14
Budget End
2010-12-31
Support Year
3
Fiscal Year
2010
Total Cost
$123,328
Indirect Cost
Name
Carnegie-Mellon University
Department
Biology
Type
Schools of Arts and Sciences
DUNS #
052184116
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213
Murphy, Robert F (2016) Building cell models and simulations from microscope images. Methods 96:33-39
Naik, Armaghan W; Kangas, Joshua D; Sullivan, Devin P et al. (2016) Active machine learning-driven experimentation to determine compound effects on protein patterns. Elife 5:e10047
Johnson, Gregory R; Li, Jieyue; Shariff, Aabid et al. (2015) Automated Learning of Subcellular Variation among Punctate Protein Patterns and a Generative Model of Their Relation to Microtubules. PLoS Comput Biol 11:e1004614
Kumar, Aparna; Rao, Arvind; Bhavani, Santosh et al. (2014) Automated analysis of immunohistochemistry images identifies candidate location biomarkers for cancers. Proc Natl Acad Sci U S A 111:18249-54
Kangas, Joshua D; Naik, Armaghan W; Murphy, Robert F (2014) Efficient discovery of responses of proteins to compounds using active learning. BMC Bioinformatics 15:143
Naik, Armaghan W; Kangas, Joshua D; Langmead, Christopher J et al. (2013) Efficient modeling and active learning discovery of biological responses. PLoS One 8:e83996
Coelho, Luis Pedro; Kangas, Joshua D; Naik, Armaghan W et al. (2013) Determining the subcellular location of new proteins from microscope images using local features. Bioinformatics 29:2343-9
Li, Jieyue; Shariff, Aabid; Wiking, Mikaela et al. (2012) Estimating microtubule distributions from 2D immunofluorescence microscopy images reveals differences among human cultured cell lines. PLoS One 7:e50292
Li, Jieyue; Newberg, Justin Y; Uhlen, Mathias et al. (2012) Automated analysis and reannotation of subcellular locations in confocal images from the Human Protein Atlas. PLoS One 7:e50514
Cho, Baek Hwan; Cao-Berg, Ivan; Bakal, Jennifer Ann et al. (2012) OMERO.searcher: content-based image search for microscope images. Nat Methods 9:633-4

Showing the most recent 10 out of 31 publications