Recent advances in high-throughput bio-imaging technologies are enabling scientists to capture the spatio- temporal patterns of gene expression in cells, organs and individuals in efforts to generate a more comprehensive picture of genome function. Today, images capturing gene expression and protein localization patterns have unprecedented spatial resolution, resulting in high-quality maps (expression images) in model organisms. However, computational biology of gene expression patterns lags far behind genome informatics. Automated and efficient tools for analyzing expression images are a prerequisite for generating biological insights into gene functions, interactions and networks for the next generation of scientists. This project focuses on the development of novel tools and techniques for large-scale biological annotation and comparative analysis of gene expression patterns in the early development of a model organism (Drosophila melanogaster;the fruit fly). This choice is important because the fruit fly is a canonical model organism for understanding the development of humans and other animals. The availability of >100,000 images that capture gene expression patterns, provides an opportunity to examine the similarities and differences in the expression of the developing embryos of fruit fly genes, many of which show a very high sequence and biochemical function similarity to humans proteins. Three primary aims of the proposed research are as follows: (a) Develop machine learning methods that will use the existing, but coarse, knowledge in training, and will produce refined and new stage information for existing and future images. The knowledge of the precise developmental stage is important because it enables the biologically-meaningful mining of genes with similar spatial patterns, calculating the developmental trajectories of gene expressions, facilitating stage-sensitive textual annotation of expressions captured in images, and building genome-wide expression pattern maps at critical junctures in development. (b) Develop machine learning methods to describe expression patterns in words by using the existing controlled vocabulary. These descriptions will enable the use of efficient text-mining tools to identify genes expressed in similar organs and their precursors. These descriptions will also provide a better comparison of expression patterns across species, because many efforts in the scientific community relate organism specific controlled vocabularies with each other. (c) Develop transfer learning techniques for stage and text annotation that can be used for images generated from future techniques. This is important because traditionally, machine learning approaches assume that the training data and test data are drawn from the same distribution. However, new bio-imaging techniques are producing higher resolution data with substantially different color and intensity distributions, and robust methods that apply across techniques are desired.
This project focuses on the analysis of gene expression patterns in the early development of Drosophila melanogaster. This fruit fly is a canonical model organism for understanding the development of humans and other animals. The proposed research will explicate the function and interconnection of animal genes and lead to a better understanding of human diseases.
Showing the most recent 10 out of 30 publications