Large-scale in situ hybridization (ISH) screens are providing an abundance of data showing spatio-temporal patterns of gene expression that are valuable for understanding the mechanisms of gene regulation. Knowledge gained from analysis of Drosophila expression patterns is widely important, because a large number of genes involved in fruit fly development are commonly found in humans and other species. Thus, research efforts into the spatial and temporal characteristics of Drosophila gene expression images have been at the leading-edge of scientific investigations into the fundamental principles of different species development. Drosophila gene expression pattern images enable the integration of spatial expression patterns with other genomic datasets that link regulator with their downstream targets. This project addresses the computational challenges in analyzing Drosophila gene expression patterns by leveraging a new bioinformatics software system. It focuses on designing principled bioinformatics and computational biology algorithms and tools that will integrate multi-modal spatial patterns of gene expression for Drosophila embryos' developmental stage recognition and anatomical ontology term annotation, and will infer gene interaction networks to generate a more comprehensive picture of gene function and interaction. The bioinformatics methods resulting from the project activities are broadly applicable to a variety of fields such as biomedical science and engineering, systems biology, clinical pathology, oncology, and pharmaceutics. Novel tools to enhance courses and research experiences for diverse populations of students are planned to broaden participation in science.
This project investigates three challenging problems for studying the Drosophila embryo ISH Images via innovative bioinformatics algorithms: 1) the sparse multi-dimensional feature learning method to integrate the multimodal spatial gene expression patterns for annotating Drosophila ISH images, 2) the heterogeneous multi-task learning models using the high-order relational graph to jointly recognize the developmental stages and annotate anatomical ontology terms, 3) the embedded sparse representation algorithm to infer the gene interaction network. It is innovative to apply structured sparse learning, multi-task learning, and high-order relational graph models to Drosophila gene expression patterns analysis and holds great promise for scientific investigations into the fundamental principles of animal development. The algorithms and tools as outcomes of this research are expected to help knowledge discovery for applications in broader scientific and biological domains with massive high-dimensional and heterogeneous data sets. This project facilitates the development of novel educational tools to enhance several current courses at University of Texas at Arlington. The PIs engage minority students and under-served populations in research activities to provide opportunities for exposure to cutting-edge scientific research. For further information see the web site at: http://ranger.uta.edu/~heng/NSF-DBI-1356628.html