Carnegie Mellon University is awarded a grant to develop novel machine learning and data mining methods to find spatio-temporal patterns of gene expressions in complex biological contexts of higher eukaryotic organisms. The focus will be on mining in situ hybridization (ISH) images of multi-cell systems and on capturing the cell-level histological context of gene expression in Drosophila embryos. The challenge is to design a good feature extraction function, distance measure, text/image data fusion methods, and spatio-temporal models, which to our knowledge remain very underdeveloped, for embryonic ISH images in Drosophila. The main novelties are: (a) more salient feature extraction based on state-of-the-art image processing techniques such as mathematical morphology, a variety of filters, wavelets, graph-theoretic and probabilistic segmentations, etc.; (b) novel graph-based methods and probabilistic models for image/text fusion and cross-modal querying; (c) novel latent- space models capturing higher-level "semantic similarity" rather than direct feature similarity of functionally or behaviorally similar genes in variable morphology contexts, (d) Kalman filters and non-linear dynamic models, to model and predict spatio-temporal evolutions of gene expression. Moreover a repository of ISH images will be created with the expressions of all known genes in the Drosophila genome from public sources. The structured database can be searched 'by image example' or by keyword (all automatically derived). Systematic profiling of in situ hybridization images of gene expression patterns will attract high interest. Powerful and sophisticated computer algorithms will be needed to analyze these data. The proposed system will meet these needs and will provide a better understanding of embryo development as well as which genes/proteins affect what. This project is a necessary step towards the ultimate, long term goal, of understanding the molecular mechanism of embryo development, and the unraveling of the gene regulation network involved in this process. It also offers a new image-based platform for genetics and developmental biology education.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Application #
0640543
Program Officer
Julie Dickerson
Project Start
Project End
Budget Start
2007-08-15
Budget End
2011-07-31
Support Year
Fiscal Year
2006
Total Cost
$1,331,995
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213