The overarching goal of the proposed project is to develop computational methods and practical bioinformatics resources for data-driven, integrative analysis of expression images and sequence data to discover functional, genetic, and regulatory interactions between genes and genomic elements. Fast growing collections of spatial and temporal gene expression patterns in the model organism, Drosophila melanogaster, are providing unprecedented opportunities for understanding the spatiotemporal regulation of expression not only for fruit fly genes, but also human genes that show extensive evolutionary similarity and functional conservation. These expression patterns are the first links between a gene's primary sequence and its influence on the phenotype, and their overlaps provide initial clues to functional, genetic, or regulatory interactions. Therefore, our primary framework for translating large volumes of images into functional knowledge is to discover and analyze co- expressed (and, thus, potentially co-regulated) genes. To date, our efforts have led to the development and establishment of a unique and innovative image-based framework (FlyExpress) to carry out high-throughput analyses of these large datasets, because the standard practice of manually inspecting images is no longer feasible owing to the sheer volume of available images. We are now poised to address a growing and urgent need to develop computational tools and data-integration methods that enable effective harnessing of fast- growing image and sequence data as well as foster enhanced engagement of the research community in building the FlyExpress knowledgebase. Therefore, we plan to (a) develop a new software tool to enable effective expression image analysis while advancing community collaborations, (b) translate knowledge of spatiotemporal expression overlap into the discovery of regulatory motifs by developing novel methods for integrative analysis of image and sequence data, and (c) evolve FlyExpress into a comprehensive knowledge- base of embryonic expression images in order to generate better predictions and integrative analysis across heterogeneous image sources. These developments will enable investigators to effectively generate and evaluate their gene interaction hypotheses based on overlaps in expression patterns by using all relevant biological information. The software tool and web system, including the source code, will always be freely available. The computational algorithms, statistical methods, and bioinformatics technologies developed in this project will be reconfigurable and adaptable for application in constructing similar frameworks for organizing expression pattern data from other species and life history stages. The FlyExpress system will fulfill the day-to-day needs of basic and applied researchers as well as students in many areas of molecular biology crucial in basic biomedicine, including computational genomics, molecular genetics, developmental biology, genetics, and evolution.

Public Health Relevance

Investigations of model organisms are critical for understanding spatiotemporal regulation of gene expression that result in alternative cell fates in a developing embryo and establish the cellular precursors of adult tissues and organs. The proposed project will produce urgently needed computational methods and practical bioinformatics resources that enable scientists to carry out integrative data-driven analysis of expression pattern images to discover functional, genetic, and regulatory interactions between genes and genomic elements. The proposed advances would lead to a more effective translation of gene expression (image) and genomics (sequence) data into the functional knowledge of human and other animal genes.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Bonazzi, Vivien
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Arizona State University-Tempe Campus
Organized Research Units
United States
Zip Code
Stanley Jr, Craig E; Kulathinal, Rob J (2016) flyDIVaS: A Comparative Genomics Resource for Drosophila Divergence and Selection. G3 (Bethesda) 6:2355-63
Wisotzkey, Robert G; Quijano, Janine C; Stinchfield, Michael J et al. (2014) New gene evolution in the bonus-TIF1-γ/TRIM33 family impacted the architecture of the vertebrate dorsal-ventral patterning network. Mol Biol Evol 31:2309-21
Montiel, Ivan; Konikoff, Charlotte; Braun, Bremen et al. (2014) myFX: a turn-key software for laboratory desktops to analyze spatial patterns of gene expression in Drosophila embryos. Bioinformatics 30:1319-21
Yuan, Lei; Pan, Cheng; Ji, Shuiwang et al. (2014) Automated annotation of developmental stages of Drosophila embryos in images containing spatial patterns of expression. Bioinformatics 30:266-73
Zhang, Wenlu; Feng, Daming; Li, Rongjian et al. (2013) A mesh generation and machine learning framework for Drosophila gene expression pattern image analysis. BMC Bioinformatics 14:372
Shimmi, Osamu; Newfeld, Stuart J (2013) New insights into extracellular and post-translational regulation of TGF-β family signalling pathways. J Biochem 154:11-9
Sun, Qian; Muckatira, Sherin; Yuan, Lei et al. (2013) Image-level and group-level models for Drosophila gene expression pattern annotation. BMC Bioinformatics 14:350
Chen, Jianhui; Tang, Lei; Liu, Jun et al. (2013) A convex formulation for learning a shared predictive structure from multiple tasks. IEEE Trans Pattern Anal Mach Intell 35:1025-38
Wisotzkey, Robert G; Konikoff, Charlotte E; Newfeld, Stuart J (2012) Hippo pathway phylogenetics predicts monoubiquitylation of Salvador and Merlin/Nf2. PLoS One 7:e51599
Li, Ying-Xin; Ji, Shuiwang; Kumar, Sudhir et al. (2012) Drosophila gene expression pattern annotation through multi-instance multi-label learning. IEEE/ACM Trans Comput Biol Bioinform 9:98-112

Showing the most recent 10 out of 28 publications